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This  manual  has  been  prepared  by  the  Statistical  Consultants 
of  the  Department  of  Computer  Science  as  documentation  for  the 
system  of  statistical  programs  known  as  SOUPAC.   SOUPAC  has  been 
written  by  the  statistical  consultants  of  this  department  in  an 
effort  to  provide  a  broad  range  of  standard  statistical  procedures 
which  are  of  use  to  the  academic  community  at  large.   Inquiries 
about  SOUPAC  should  be  directed  to  138  Digital  Computer  Laboratory. 


yt,  .     <£~~  3  PROGRAM  LIST 

September,  1970 

AUTOCORRELATION  AND  SPECTRAL  ANALYSIS 

BALANOVA  5 

BINORMAMIN 

BISERIAL  CORRELATION 

CANONICAL  ANALYSIS 

CENTROID  FACTOR  ANALYSIS 

CLASSIFICATION 

CLIQUE  ANALYSIS 

COMMUNALITY  ESTIMATION 

CORRELATION 

DISCRIMINANT  ANALYSIS 

ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS 

FREQUENCY  COUNTING  -  MEASURES  OF  ASSOCIATION 

ITERATIVE  FACTOR  ANALYSIS 

JACOBI  (Eigenvalues  and  Eigenvectors) 

K  -  CLASS  ESTIMATION 

LINEAR  PROGRAMMING 

MATRIX  OPERATIONS 

MISSING  DATA  CORRELATION 

MULTIPLE  CORRELATION 

OBLIMAX 

PARTIAL  CORRELATION 

PRINCIPAL  AXIS  FACTOR  ANALYSIS 

PROBIT  (Maximum  Likelihood  Regression) 

PROCRUSTES 

QUADRATIC  PROGRAMMING 

RANDOM  NUMBER  GENERATOR 

RANKING 

SCALOGRAM  ANALYSIS 

SQUARE  ROOT  FACTOR  ANALYSIS 

STANDARD  SCORES 

STEP-WISE  MULTIPLE  CORRELATION 

THREE  -  STAGE  LEAST  SQUARES 

TRANSFORMATIONS 

T-TESTS 

UNRESTRICTED  MAXIMUM  LIKELIHOOD  FACTOR  ANALYSIS 

VARIMAX 


NOTE:      The  programs   are   arranged  alphabetically  in  the  manual. 


January  20,  1970 


The  following  are  temporary  program  data  size  restrictions  imposed  by  MVT 
(Multiprocessing  with  a  Variable  Number  of  Tasks) .   The  restrictions  are  in 
effect  until  such  time  as  the  whole  system  can  be  reprogrammed  for  dynamic 
storage  whereupon  most  programs  will  accept  up  to  H50  variables. 


AUTOCORRELATION 

BALANOVA  5 

BINORMAMIN 

BISERIAL 

CANONICAL 

CENTROID 

CLASSIFICATION 

COMMUNALITY 

CLIQUE 

CORRELATION 

DISCRIMINANT 

ECONOMETRICS 

FREQUENCY 

ITERATIVE 

JACOBI 

LINEAR 

MATRIX 

MISSING  DATA 

MULTIPLE  CORRELATION 

OBLIMAX 

PARTIAL  -  Tetrachoric 

PRINCIPAL  AXIS 

PROBIT 

PROCRUSTES 

QUADRATIC 

RANDOM 

RANK 

SCALOGRAM 

SQUARE  ROOT 

STANDARD  SCORES 

STEP-WISE 

TRANSFORMATION 

T-TEST 

UNRESTRICTED 

VARIMAX 


15  variables     2500  observations     1500  lag  periods 

10  factors     10,000  cells     200  dependent  variables 

150  x  30  matrix 

20  dichotomous  variables     100  continuous  variables 

87  maximum  total  variables 

180  x  180  matrix  (input) 

maximum  of  20  groups 

175  x  175  matrix  (input) 

175  x  175  matrix  (input) 

175  x  175  matrix  (output) 

70  variables     20  groups 

less  than  150  variables 

UOO  variables 

130  x  65  matrix 

110  variables 

90  constraints     300  variables 

I65  x  165  matrix  (input /output) 

100  x  100  matrix  (output) 

150  total  variables 

300  x  100  matrix  (input) 

125  x  125  matrix  (output) 

175  x  175  matrix  (input) 

input  vector  length  3  <  N  <  3000 

190  x  70  matrix 

HO  variables     80  variables  plus  constraints 

i+50  variables 

n  x  U50   where  n=sample  size 

U90  variables 

175  x  175  matrix  (input) 

^50  variables 

175  variables 

2000  variables 

150  variables    Ik   groups 

75  variables     30  factors 

175  x  175  matrix 


INTRODUCTION  TO  SOUPAC 
A  USER'S  GUIDE 


A  SOUPAC  job  consists  of  two  kinds  of  statements,  plus  data.  The  first 
statement  type  is  360  system  statements,  which  are  used  to  give  instructions 
to  the  36O  computer.   The  second  type  is  SOUPAC  statements,  which  are  used 
to  describe  the  types  of  data  manipulations  and  statistical  analyses  that 
you  want  performed.  The  set  of  SOUPAC  statements  in  one  job  is  called  a 
SOUPAC  parameter  deck,  and  it  will  include  references  to  one  or  more 
individual  SOUPAC  programs.  Note  all  cards  should  be  punched  on  an  IBM  029 
keypunch. 

Every  360  system  statement  must  be  on  a  separate  card  and  must  have  the 
characters  //  or  /*  in  columns  1  and  2.   In  particular,  the  system  cards 
for  a  SOUPAC  job  are  listed  here,  in  the  order  they  must  appear: 

CARD  1      /*ID      accounting  information 
CARD  2      //  EXEC   SOUPAC 
CARD  3      //SYS  IN  DD  * 

SOUPAC  parameter  deck 
CARD  k  /* 

Note  that  these  four  cards  are  an  absolute  minimum. 

CARD  1 

The  /*ID  card  is  punched  on  a  yellow  striped  card  available  in  the 
keypunch  area.   This  card  has  no  clipped  corners  and  is  used  only  as  an 
ID  card  for  jobs  to  be  run  on  the  360.   The  first  five  columns  of  this  card 
must  contain  the  characters  /*ID  followed  by  a  blank.   This  is  the  only 
card  in  your  deck  which  must  be  punched  on  a  special  card;  the  rest  of  your 
deck  may  be  punched  on  any  of  the  standard  corner  cut  cards  found  in  the 
keypunch  area. 

Sample  ID  cards : 

/*ID  PS=9999,DEPT=AGRON,NAME=SMITH 

/*ID  NAME= ' B0B_SMITH ' , PS=8899, DEPT=VOTEC,  LINES=5000 

/*ID  CODE=SWITCH,PS=9988,NAME=STEPPEIWULF,DEPT=REC 

The  accounting  information  consists  of  the  following  keywords  and 
responses : 

KEYV70RD  RESPONSE 


necessary 


PS=  your  problem  specification  number 

DEPT=  your  department 

NAME=  your  name 


-2- 


optional 


KEYWORD 

(min, sec) 
TIME=(,sec) 
min 


IOREQ=n 


CODE=ZZZZZ 


LINES=XXXXX 


CARDS =XXXX 


RESPONSE 


the  optional  TIME  parameter  on  the 
ID  card  is  the  estimate  of  execution 
time  for  the  job.  By  default  it  is 
1  minute.   The  time  estimate  on  the 
ID  card  is  the  time  estimate  for  the 
SOUPAC  job.   If  there  is  a  time 
estimate  on  the  //  EXEC  SOUPAC  card 
there  must  be  at  least  as  much  time 
on  the  ID  card. 

where  n  is  the  number  of  INPUT/OUTPUT 
requests.   The  default  is  1000. 
Moving  from  cards  to  a  temporary 
storage  location  generates  one  I/O 
request  per  observation.  Moving 
to  or  from  a  temporary  storage 
location  generates  two  I/O  requests 
per  observation.   Cards  read  in  or 
lines  printed  out  do  not  generate 
I/O  requests.   The  interactive  nature 
of  SOUPAC  programs  often  necessitates 
considerable  amounts  of  data  transfer, 
For  this  reason,  I/O  requests  for 
SOUPAC  jobs  are  relatively  large. 

code  is  the  signal  that  your  problem 
specification  number  is  code  word 
protected.   If  you  have  code  word 
protection,  get  the  code  word  from 
your  approved  problem  specification 
request  form.   If  you  do  not  have 
code  word  protection,  do  not  use  the 
keyword  CODE. 

you  requested  XXXXX  lines  of  output. 
The  default  is  2000. 

you  requested  XXXX  cards  punched. 
The  default  is  NONE. 


There  must  be  no  blank  spaces  in  the  accounting  information  and  the  order 
of  the  accounting  information  makes  no  difference.   Blanks  in  a  name,  such 
as  JOHN  USER,  are  not  allowed.  The  name  must  be  only  one  word  or  of  the 
form  JOHNUSER.   If  blanks  are  desired,  then  the  name  must  be  enclosed  in 
apostrophes  and  an  underscore  substituted  for  the  blank,  e.g.  'J0HN_USER'. 

Note  further  that  commas  separate  keywords  from  responses  to  keywords 
and  that  the  equal  sign  is  a  part  of  the  keyword  response.  Furthermore, 
your  ID  card  parameters  may  not  go  beyond  column  71*   T°  continue  an  ID  card 
put  a  comma  after  the  last  parameter  that  will  fit  completely  into  71  columns 


or  less  and  proceed  on  a  second  card  which  has  the  characters  /*  punched  in 
columns  1  and  2,  followed  by  at  least  one  blank.   This  second  card  may  be 
any  standard  corner  cut  card. 

Further  information  about  additional  parameters  is  available  from  either 
the  Service  Programming  Area  or  the  SOUPAC  office. 

CARD  2 

The  //  EXEC  SOUPAC  is  punched  as  in  the  example  on  page  1  with  two 
slashes  in  columns  1  and  2>  followed  by  at  least  one  blank.   This  is 
followed  by  the  word  EXEC,  at  least  one  blank,  and  then  the  word  SOUPAC. 

There  is  in  fact  another  way  to  get  a  SOUPAC  job.   One  can  say: 

//   EXEC  SOUP 
//SYSIN  DD   * 

The  essential  differences  are  these:   SOUPAC  has  15  available  temporary 
storage  locations;  SOUP  has  only  5  temporary  storage  locations.   SOUP, 
however,  runs  about  five  seconds  faster  than  SOUPAC  and  is  generally 
recommended  for  users  who  do  not  require  more  temporary  storage  locations 
than  SI  through  S5- 

The  amount  of  memory  to  be  used  is  specified  on  each  //  EXEC  SOUPAC 
or  //  EXEC  SOUP  card  rather  than  on  the  ID  card.   SOUPAC  jobs  are  given 
a  default  region  of  150K  which  is  adequate  for  nearly  all  SOUPAC  jobs. 
See  sections  on  options  to  a  SOUPAC  job. 

CARD  3 

The  next  card  is  copied  as  in  the  example  with  //SYSIN  beginning  in 
column  1  and  extending  through  7  without  any  blanks.   This  is  followed  by  at 
least  one  blank,  the  two  characters  DD,  at  least  one  blank,  finally  the 
character  *. 

SOUPAC  Statements 


The  SOUPAC  parameter  deck  is  divided  into  two  main  sections.   The 
first  consists  of  the  problem  program  parameter  cards.   Individual  problem 
program  write-ups  are  available  to  users  through  the  SOUPAC  office.   The 
last  card  of  the  problem  program  parameter  cards  must  always  contain  the  words 
END  SOUPAC  as  the  first  non-blank  characters.   The  END  SOUPAC  card  is 
immediately  followed  by  any  data  decks  you  may  have.   All  data  decks  are 
preceded  by  a  DATA  format  card  and  followed  by  an  END#  card.   The  format 
card  has  the  word  DATA  as  the  first  four  non-blank  characters.   After  the  word 
DATA  may  be  any  comments  the  user  may  want,  if  any.   These  comments  will 
be  printed  on  the  user's  output,  but  will  be  otherwise  ignored  by  SOUPAC. 
Next  comes  the  data  card  parameters.   The  user  must  specify  the  number  of 
variables  per  row  of  data  to  be  input,  and  also  may  optionally  specify  the 
number  of  rows  of  data (observations ) .   Following  this  information  is  a 
standard  FORTRAN  type  format  enclosed  in  parentheses.   This  is  the  format 


which  describes  the  structure  of  the  data  deck  it  precedes.   The  following 
forms  of  the  data  format  card  are  valid: 

DATA(observat ions  , variables  )  (format  string£_592  non-blank  characters) 

DATA( , variables) (format  string£592  non-blank  characters) 

DATA(variables)( format  string<_592  non-blank  characters) 

Remember  that  all  data  decks  must  be  terminated  by  END#  as  the  first 
four  non-blank  characters  on  a  single  card. 

Data  used  for  calculations  should  be  read  in  either  E  or  F  format. 
Data  not  used  for  calculations  can  be  read  in  any  format. 

END  CARDS 

Within  a  SOUPAC  parameter  deck,  three  types  of  END  cards  are  used: 

END#:    This  card  is  used  to  end  each  data  deck  which  appears.   For  each 
DATA  format  card  used,  there  must  be  a  corresponding  END#  card 
at  the  end  of  the  data  deck. 

END  S:   This  card  appears  once  and  only  once  per  SOUPAC  parameter  deck 

and  must  appear  as  the  last  card  in  the  parameter  deck.   Its  function 
is  to  separate  the  SOUPAC  parameter  deck  from  the  data  decks. 

END  P:   This  card  is  used  to  indicate  the  end  of  any  program  which  requires 
subparameter  cards  and  must  appear  for  all  such  programs.   (in  this 
context  $-control  cards  are  not  taken  to  be  subparameter  cards). 
Any  program  which  needs  the  END  P  card  explicitly  states  so  within 
its  individual  program  write-up.   For  all  other  programs ,  use  of  the 
END  P  card  is  an  error.   If  a  program  which  uses  the  END  P  card 
is  also  the  last  program  to  be  executed  it  must  be  terminated  by  an 
END  P  card  then  an  END  S  card  must  appear  to  indicate  that  no  more 
cards  follow. 

CARD  k 

The  last  physical  card  in  your  deck  must  contain  the  characters  /* 
in  columns  1  and  2.   No  other  punches  should  appear  on  this  card. 

Sample  SOUPAC  deck  setup: 

//ID  NAME=SMITH,DEPT=PEM,PS=689^,LINES=5000,CARDS=500,IOREQ=10000 
//   EXEC   SOUPAC 
//SYSIN  DD   * 

program  1  ()()()(  ) 

program  2  ()()()(  ) 

program  n  ()()()(  ) 

END  SOUPAC 
DATA 

END# 
/* 


Note  that  there  may  be  more  programs  than  data  decks  or  more  data  decks 
than  programs.   In  any  case,  for  programs  that  read  data  decks,  the  user 
must  insure  that  the  order  of  data  decks  corresponds  to  the  order  of 
programs  that  read  data  decks  since  the  first  program  to  read  a  data  deck 
will  read  the  first  available  deck,  and  so  on.   The  process  continues  until 
either  the  SOUPAC  program  list  is  exhausted  or  a  program  reading  data  cannot 
find  a  data  deck  available.   Notice  that  extra  data  decks  are  ignored. 

A  FINAL  NOTE 

Any  changes  the  user  may  want  to  specify  in  the  job  control  language 
of  SOUPAC  must  follow  the  standard  OS/360  conventions. 


OPTIONS  TO  A  SOUPAC  JOB: 
PARMS,  PROLOG  CARDS,  AND  $-CONTROL  CARDS 


A.   PARMS 


PARMS  are  arguments  to  the  keyword  'Parameter,'  contracted  into 
the  keyword  'PARM' ,  which  give  instructions  to  a  processor  running  under  a 
360-system.   In  this  context,  SOUPAC  is  a  processor  running  under  a  360 
system.   PARMS  are  always  coded  on  an  EXEC  card  and  have  the  following  form: 

/ /   EXEC  SOUPAC , PARM= ' 0PT1 , 0PT2 , , OPTm ' 

The  permissable  options  to  be  used  as  SOUPAC  PARMS  are  listed  below  with  an 
explanation  of  their  use  and  function.   Note  that  the  default  is  underlined, 
that  is,  //  EXEC  SOUPAC  is  equivalent  to  //  EXEC  SOUPAC ,PARM= '0PT1,0PT2. . , 
where  the  underlined  PARM  is  to  be  taken  as  one  of  the  list  of  options  in  the 
PARM  string  in  the  example.   These  PARMS  give  the  SOUPAC  system  instructions 
in  the  same  way  that  parameters  give  SOUPAC  statistical  or  data  management 
programs  instructions. 

1.  NODYNAM  or  DYNAM 

NODYNAM  implies  that  a  non-dynamic ally  allocatable  version  of  the 
library  of  statistical  procedures  is  to  be  used.   This  version  will 
run  in  some  150K  of  core  and  will  handle  a  lesser  number  of 
variables  than  the  dynamically  allocatable  version.   DYNAM  will  use 
the  dynamically  allocatable  version  of  any  program  requested  which 
will  handle  more  variables  in  an  arbitrarily  specified  amount  of 
core  above  a  certain  minimum.   If  using  DYNAM,  see  the  SOUPAC 
consultants  for  a  handout  on  optimal  region  sizes  for  particular 
numbers  of  variables. 

2.  EXECUTE  or  NOEXECUTE 

NOEXECUTE  implies  that  the  SOUPAC  parameter  deck,  for  which  the 
Syntax  Interpreter  is  to  scan  and  build  intermediate  parameters, 
should  not  be  executed.   NOEXECUTE  indicates  that  only  a  syntax 
check  is  to  be  performed.   If  EXECUTE  is  specified  and  no  errors 
are  found  by  the  Syntax  Interpreter,  the  job  step  will  proceed. 
If  EXECUTE  is  specified  and  errors  are  found  by  the  Syntax  Inter- 
preter, execution  of  the  step  may  continue  depending  upon 
whether  LET  or  NOLET  is  also  specified. 

3.  NOLET  or  LET 

If  an  error  is  found  by  the  Syntax  Interpreter  and  EXECUTE  has 
been  specified,  execution  will  proceed  only  if  LET  was  also 
specified.   In  this  case,  execution  will  proceed  only  through  the 
last  program  processed  which  was  completely  error  free.   If  NOLET 
was  specified  and  errors  are  found  by  the  Syntax  Interpreter, 
execution  will  not  be  permitted. 

h.      LIST  or  NOLIST 

LIST  indicates  that  all  program  cards  are  to  be  listed.   NOLIST 
indicates  that  only  the  prolog  section  of  the  SOUPAC  parameter 
deck  is  to  be  listed. 


5 •   PGM  or  NOPGM 

PGM  indicates  that  a  complete  SOUPAC  parameter  deck  and  data  decks 
follow.   NOPGM  indicates  that  only  the  prolog  section  and  data 
deck  follow,  and  that  the  intermediate  parameters  are  being  provided 
by  the  user  by  over-riding  the  cataloged  procedure.   This  implies 
that  the  user  has  previously  run  a  SOUPAC  job  and  has  saved  the 
two  necessary  data  sets  so  that  he  may  run  the  same  program  again. 
To  perform  this  saving  of  data  sets  correctly,  a  user  should  visit 
the  SOUPAC  office  first  to  ensure  it  is  done  correctly. 

If  any  error  is  found  by  the  Syntax  Interpreter  in  the  prolog  section, 
the  job  step  will  not  continue. 

If  the  job  step  which  generated  the  intermediate  parameter  data 
sets  found  syntax  errors,  execution  of  the  job  step  in  which 
NOPGM  is  specified  will  continue  (if  EXECUTE  is  specified)  through 
the  last  program  processed  which  was  completely  error  free  regard- 
less of  whether  LET  or  NOLET  was  specified  in  either  job  step. 

Examples : 

To  do  just  a  syntax  check: 

/ /   EXEC   SOUPAC , PARM= ' NOEXECUTE ' 
To  execute  up  to  the  first  program  found  to  have  syntax  errors : 

/ /   EXEC   SOUPAC ,PARM= ' LET ' 

To  execute  up  to  the  first  program  found  to  have  syntax  errors  and 
use  the  dynamically  allocatable  library: 

/ /   EXEC   SOUPAC , PARM= ' LET , D YNAM ' . 

Note  that  the  PARMS  may  be  listed  in  any  order. 

B.   PROLOG  OF  A  SOUPAC  JOB 

Described  below  are  several  #  control  cards  which  may  appear  in  the 
prolog  of  a  SOUPAC  job.   Within  the  prolog  these  control  cards  may 
appear  in  any  order.   If  prolog  control  cards  are  used,  they  must  appear 
immediately  after  the  SYSIN  card.   The  Syntax  Interpreter  determines 
the  end  of  the  prolog  when  it  reads  a  card  which  is  not  one  of  these 
types.   All  types  have  parameters  and  must  be  terminated  by  a  period. 
Prolog  cards  may  not  have  continuation  cards,  hence  all  parameter 
information  must  be  punched  within  80  columns.   There  is  no  limit 
to  the  number  of  prolog  cards  permitted  nor  is  there  any  restriction  on 
the  number  of  any  one  type.   If  conflicting  information  is  entered,  the 
information  entered  last  overrides  any  previous  definitions. 

1.   ^REPEAT  OPTION 

The  #REPEAT  OPTION  is  used  to  repeat  sections  of  a  SOUPAC 
parameter  deck  an  optional  number  of  times.   The  ^REPEAT  card 


which  appears  in  the  prolog  section  will  be  followed  by  up  to 
22  (twenty- two)  integer  parameters  which  will  indicate  the  number 
of  repetitions  of  up  to  22  repeat  sequences.   The  card  sequences  to 
be  repeated  will  be  preceded  and  followed  by  #SREP  and  #EREP 
cards  respectively.   Example: 

/*ID 
//  EXEC  SOUP 
//SYSIN  DD  * 
^REPEAT  (2). 

<additional  program  cards> 
#SREP 

CORRELATION  (C)(  )(Sl). 

SQUARE  ROOT  FACTOR  ANALYSIS  (SI ) (P(F) ) (20) (C) (P(F) ) . 
#EREP 
END  S 

In  this  example  the  program  sequence  of  CORRELATION  and  SQUARE  ROOT 
FACTOR  ANALYSIS  will  be  repeated  twice.   Four  card  input  data 
sets  would  be  required  for  the  repeated  sections. 

Repeat  sequences  which  begin  before  a  main  program  and  end  in 
a  subprogram  or  which  begin  in  a  subprogram  and  do  not  end  in  the 
same  subprogram  are  not  allowed.   Nested  or  overlapping  repeat  sequences 
are  not  allowed.   Also  a  #SREP  card  cannot  be  immediately  followed 
by  a  #EREP  card  and  a  single  appearance  in  the  deck  of  either  card 
will  cause  an  error. 

2.   #V-UNIT  OPTION 

#V-UNIT  allows  the  user  to  change  input  and  output  addresses 
in  the  execution  of  one  SOUPAC  job.   The  form  of  a  #V  is  as  follows: 

#Vn  (m)  (A1) (Ak). 

where  n  is  an  integer  1  through  9>  thus  there  can  be  at  most  9 
variable  addresses,  namely  VI  through  V9;  and  m  is  a  counter  which 
determines  how  many  times  a  variable  address  may  be  used  before  it 

assumes  the  next  value  in  its  list  of  possible  values.   A]_ Ak 

are  addresses  which  Vn  assumes.   These  can  be  any  valid  address. 

At  the  moment,  however,  forms  like  (Sl/P)  will  not  work.   Note  that  / 

CARDS  and  PRINT  are  permitted. 

Finally,  the  list  of  addresses  is  cyclic;  that  is,  if,  after 
A^  has  been  used,  Vn  occurs  again  in  the  program,  Vn  will  have  the 
value  A]_,  and  so  on. 

/*ID 

//  EXEC  SOUP 

//SYSIN  DD  * 

#V9(1)(S1)(S2)(S3)(SU). 

#V5(1)(S1)(S2)(S3)(SU). 

#REPEAT  {h). 

MAT. 


(Example,  Continued) 

#SREP 

MOV  (C)(V9). 

#EREP 

HOR  (V5)(V5)(V5)(V5)(S5) 

END  P 


END  S 

This  program  segment  reads  h   separate  card  decks,  saving  them  in 
temporary  storage,  and  horizontally  augments  them  into  one  data  set. 

The  equivalent  without  the  use  of  #REPEAT  and  #V  would  be  as  follows 

/*ID 

//   EXEC   SOUP 

//SYSIN  DD   * 

MAT. 

M0V(C)(S1). 

M0V(C)(S2) 

M0V(C)(S3) 

MOV(C)(SU) 

H0R(S1)(S2)(S3)(SU)(S5). 

END  P 


END  S 

Note  that  the  M0V(C)(V9).  statement  is  expanded  into  four  move  statements 
and  V9  takes  the  values  SI  through  S^.   Similarly,  V5  takes  on  the 
values  of  SI  through  Sk. 

3.  #OLD  OPTION 

The  #OLD  option  is  used  to  define  the  number  of  rows  in  a 
sequential  data  set  created  by  a  previously  run  SOUP AC  job.   The 
number  of  rows  is  then  entered  into  a  table  in  the  monitor.   This 
option  should  be  used  whenever  the  header  record  on  the  data  set 
is  not  known  to  have  a  correct  value  for  the  number  of  rows ,  and  the 
user  does  not  want  to  execute  a  MATRIX  MOVE  to  count  the  rows.   To 
use  the  option,  punch  a  card  with  #OLD  in  the  first  four  columns. 
Then  code  the  address  and  the  number  of  rows  in  the  usual  SOUPAC 
fashion.   The  number  of  columns  may  be  coded  on  the  card  if  desired, 
but  will  be  totally  ignored.   Include  this  card  in  the  prolog  section 
of  the  SOUPAC  job. 

For  example,  to  indicate  that  a  data  set  to  be  input  from 
SEQUENTIAL  1  has  77  rows  you  would  prepare  the  following  card: 

#OLD  (SI) (77). 


k.      #TEST  OPTION 

There  is  also  available  a  #TEST  option;  however,  this  facility 
is  complicated  and  intended  for  testing  purposes  within  the  SOUPAC 
office  and  has  no  significant  advantage  for  the  general  user. 

5.   #DEFINE  OPTION 

Whenever  the  user  wishes  to  specify  the  dimensions  of  a  direct  access 
data  set  (DISK  address),  punch  ^DEFINE  in  the  first  seven  columns  of 
a  card  followed  by  the  address,  number  of  rows  and  number  of  columns 
coded  in  the  usual  SOUPAC  fashion.   Include  this  card  in  the  prolog 
section  of  your  program.   For  double  precision  matrices,  code  the  same 
number  of  rows,  but  twice  as  many  columns  as  otherwise.   DISK  1  and  DISK  2 
have  default  definitions  of  k^O   rows  by  1+50  columns  single  precision. 
If  the  user  desires  any  other  dimensions  on  these  data  sets,  #DEFINE  must 
be  used.   If  the  user  desires  to  use  any  DISK  address  other  than  DISK  1 
and  DISK  2,  #DEFINE  must  be  used  besides  supplying  the  necessary  DD 
cards . 

For  example,  to  define  a  data  set  for  DISK  IT  with  20  rows  and  kO 
columns  double  precision,  you  would  prepare  the  following  card: 

#DEFINE  (DISK  17)(20)(80). 

Notice  that  all  prolog  cards  start  with  a  #  in  column  one  and  must  occur 
before  any  SOUPAC  program  parameter  cards.  A  #-card  in  the  middle  of 
the  SOUPAC  program  parameter  deck  is  treated  as  a  comment.   There  is, 
however,  a  ^-control  card,  while  not  strictly  a  prolog  card,  which  may 
occur  in  the  SOUPAC  program  parameter  deck  and  will  not  be  treated  as 
a  comment.   This  is  the  #-zero  card  and  is  the  only  exception  to  the 
statement  about  #  cards  being  comments  if  in  the  middle  of  the  deck. 
The  #-zero  card  is  essentially  a  debugging  tool  to  facilitate  reading 
of  dumps  if  one  is  needed.   It  has  no  particular  use  for  the  user. 


C.   I-C0NTR0L  CARDS 

$-C0NTR0L  CARDS  are  used  to  provide  additional  information 
to  a  SOUPAC  program  above  and  beyond  what  is  included  in  the  parameters . 
There  are  3  $-control  cards.   All  must  begin  in  column  one  with  the  character 
$  and  then  continue  accross  the  card  without  blank  columns. 

1.   $C-B 

The  $C-B  card  provides  as  its  arguments  the  variables  to 
be  used  as  control  breaks  for  a  program  which  accept  control 
breaks.   The  use  of  this  card  with  a  program  which  does 
not  accept  control  breaks  is  an  error.   The  form  of  this  card 
is  as  follows: 

$C-B(V;l)(V2) (Vn). 

When  V-|_  through  Vn  are  variable  numbers  and  n  must  be  less  than  or 
equal  to  2k. 


2.  $INP 

$INP  has  as  its  arguments  a  string  of  input  addresses. 
The  form  is: 

$INP(A!) (A  ). 

where  A-j_  through  An  are  input  addresses  including  cards. 
The  number  of  addresses  will  be  determined  by  the  program 
accepting  the  $INP  card  and  will  explicitly  mentioned  in  the 
program  write-up. 

3 .  $OUT 

$OUT(A.  ) (A  ). 

I   x  1  n 

$OUT  has  as  its  arguments  a  string  of  output  addresses.   The 

form  is  the  same  as  that  for  $INP  and  the  number  of  addresses 

is  also  determined  by  the  program  accepting  the  $OUT  card. 

Multiple  output  address  will  be  accepted.   See  section  on 

Input /Output  multiple  addresses. 


SOUPAC  INPUT-OUTPUT  AND  TEMPORARY  STORAGE 

I.  GENERAL 

A.  Input  and  Output  as  Data  Types 

Consider  a  set  of  data  which  a  researcher  wants  intercorrelated. 
To  do  correlations  there  is  in  the  SOUPAC  lihrary  of  statistical  precedures 
a  correlation  program.   Input  to  the  correlation  program  is  the  researcher's 
raw  data;  output  from  the  correlation  program  is  a  matrix  of  correlation 
coefficients.  Similarly,  every  conceivable  program  has  a  particular  input; 
in  fact,  perhaps  several  inputs,  and  some  output. 

The  nature  of  the  input  and  output  of  a  particular  program  will 
depend  on  the  program  and  its  intent.   For  example,  raw  data  variables 
are  input  into  a  correlation  program  which  outputs  a  correlation  matrix. 
But  a  factor  analysis  program  expects  as  input  a  correlation  matrix, 
and  yields  as  output  a  factor  matrix.   In  contrast  to  the  singular 
relation  of  the  nature  of  input  and  output  to  a  particular  statistical 
program,  every  program  finds  its  input  somewhere  and  must  put  its  output 
somewhere . 

B.  Input  and  Output  as  Data  Sources 

SOUPAC  is  designed  in  such  a  manner  that  the  researcher  can  tell 
any  program  where  his  inputs  are  and  where  to  put  his  outputs.   Punched  cards 
are  an  obvious  input  source;  printed  pages  are  an  obvious  output  source. 
But  the  nature  of  a  punched  card  deck  input  into  a  correlation  program  would 
be  that  of  raw  data  variables.   In  the  SOUPAC  system  input  and  output  sources 
are  also  called  addresses.   Thus,  a  possible  input  address  for  a  correlation 
program  is  cards  and  a  possible  output  address  for  correlation  coefficients 
is  print.   Input  and  output  addresses  are  parameters  to  every  program  in 
the  SOUPAC  system.   As  the  researcher  reads  a  particular  program  write-up 
he  will  notice  that  the  order  of  the  parameters  determines  the  nature  of 
his  input  or  output  and  his  supplying  an  input  or  output  address  determines 
whether  or  not  he  uses  or  gets  the  particular  inputs  and  outputs. 

II.  ELEMENTARY  INPUT/OUTPUT  ADDRESS  AND  TEMPORARY  STORAGE 

A.   Possible  elementary  input  and  output  addresses  in  the  SOUPAC 
system  are  these: 

INPUT:   CARDS,  SEQUENTIAL  1,  SEQUENTIAL  2,  .  .  . 
.  .  .  SEQUENTIAL  15 


OUTPUT:   PRINT.  SEQUENTIAL  1,  SEQUENTIAL  2 
.  .SEQUENTIAL  15  (See  section  on  j 


punched  cards ) 


Again,  CARDS  and  PRINT  are  obvious  sources.   SEQUENTIAL  1  through 
SEQUENTIAL  15,  however,  are  input  or  output  names  of  15  temporary  storage 
regions  available  to  the  researcher  in  the  SOUPAC  system.   These  15 
temporary  storage  regions  are  provided  for  exactly  that  purpose,  temporary 
storage  of  data.   Notice  that  with  this  facility  a  user  can  save  his 


correlation  matrix,  for  example,  at  SEQUENTIAL  1  and  then  give  SEQUENTIAL 
1  as  an  input  address  to  a  factor  analysis  program.   Or  a  researcher  can 
construct  a  copy  of  his  data  on  temporary  storage  and  then  let  any  number 
of  programs  use  the  same  data  as  input  from  the  same  input  address,  saving 
him  the  effort  of  making  multiple  copies  of  his  card  deck  so  that  each 
program  would  read  its  own  deck.   Finally,  temporary  storage  addresses 
enable  the  saving  of  intermediate  results  for  further  processing  or 
modification  by  other  programs  and  thereby  enable  the  researcher  to 
construct  his  own  analysis  procedure  by  providing  the  appropriate  inputs 
and  outputs  to  the  right  programs  at  the  right  times. 

B.   SOUP  vs  SOUP AC  with  Respect  to  Temporary  Storage 

There  are,  however,  two  ways  of  invoking  the  SOUPAC  system.   One 
can  ask  for  SOUPAC  or  SOUP.   Notice  that  to  SOUPAC  there  are  provided  all 
15  temporary  storage  regions;  to  SOUP  there  are  provided  only  5,  namely 
SEQUENTIAL  1  through  SEQUENTIAL  5-   Asking  for  SEQUENTIAL  6  through 
SEQUENTIAL  15  when  running  under  SOUP  will  cause  an  error  and  terminate 
the  job. 

All  of  these  input-output  addresses  may  be  abbreviated  as  follows 

CARDS  C 

PRINT  P 

SEQUENTIAL  1       SI  (or  Tl) 


SEQUENTIAL  15      SI 5  (or  T15') 

Tl  through  T15  are  alternative  abbreviations  for  SEQUENTIAL  1  through 
SEQUENTIAL  15-   Tl  through  T15  are,  in  fact,  abbreviations  of  TAPE  1 
through  TAPE  15.   SEQUENTIAL  1  through  SEQUENTIAL  15  and  their  abbreviations 
are  the  recommended  uses.   The  Tl  through  T15  notation  reflects  a  real 
technical  distinction  but  has  been  kept  to  enable  programs  using  that  notation 
to  run. 

C.   Multiple  Output  Addresses 

A  researcher  may  want  to  output  to  several  sources:   he  may  desire 
to  both  print  and  save  some  results  for  later  use.   He  cannot,  however, 
input  from  more  than  one  source  for  a  particular  input  address.   The  facility 
of  multiple  output  addresses  has  the  following  construction: 

(output  addressl/output  address^/output  address3). 

This  is  the  completely  general  form  providing  for  up  to  three  separate 
outputs.   Each  output  must  be  a  different  source,  however.   Thus,  (Sl/P/X) 
is  a  valid  multiple  output  address  providing  for  temporary  storage  at  SI, 


a  print  of  the  same  data,  and  a  punched  copy  of  the  data.   (See  section  on 
punch  for  explanation  of  X).   (Sl/P)  will  print  and  store  but  not  punch. 
The  order  of  the  addresses  makes  no  difference.   (Sl/P)  is  equivalent  to 
(P/Sl).   Forms  such  as  (S1/S2),  however,  are  not  permitted,  nor  are  (P/P) 
or  (X/X) :  one  can  output  only  to  one  sequential  and  only  once  to  P  or  X. 

The  above  general  form  is  avialable  only  if  the  output  address  in 
the  particular  program  is  marked  with  an  P.. 

In  all  cases,  however,  the  form 

(output  address  l/output  address  2) 

is  valid  unless  the  program  write-up  explicitly  has  a  restriction. 

D.   Print  is  F  Form  of  Output  Print  Address. 

There  is  yet  another  form  to  output  addresses.   This  form  is 
available  only  where  the  researcher  finds  the  symbol  fi  in  the  program  write- 
up  and  has  to  do  with  the  kind  of  printed  output.   For  technical  reasons, 
most  programs  print  in  a  form  called  E-format  which  is  a  form  of 
scientific  notation.   This  form  allows  the  computer  to  print  numbers  of 
any  size.   Some  programs,  for  which  the  output  numbers  are  known  to  be 
constrained,  as  in  correlation  coefficients,  however,  print  in  a  form 
called  F- format  which  is  ordinary  decimal  number  representation.   F-format 
generally  cannot  print  numbers  larger  than  a  pre-determined  size.   The 
size  of  number  depends  on  the  nature  of  a  researcher's  data,  but  the  program 
has  no  way  of  knowing  this,  hence,  the  most  general  form,  E-format  is  used. 

The  researcher  however,  can  on  option  specify  F-format.   To  print  in 
F-format  he  would  use  the  following  output  address: 

(P(F))  or  (P(F)/S1)  if  he  wanted  a  multiple 

output  address.  Those  programs  which  print  in  F-format  already,  as  for 
correlation  coefficients,  can  be  made  to  print  in  E-format  by  using  the 
following  output  address: 

(P(E))  or  (P(E)/S15). 

The  different  forms  look  like  this: 

E-format  Scientific  Notation        F-format  Decimal  Number 

Representation 

+  0.123U5E  06    +  1.23U5  x  105  +  123U56.123U5    +123^56. 123U5 

All  four  numbers  have  the  same  value  correct  to  5  places.   Notice  that  F-format 
cannot  represent  a  number  greater  than  999999-99999  in  absolute  value 
whereas  E-format  can  represent  the  first  5  digits  of  any  number  of  order  of 
magnitude  up  to  1099  .   The  numbers  of  digits  illustrated  for  E  and  F 
formats  are  the  pre-determined  limits  for  the  size  of  numbers.   E-format  is 
the  more  general  form  but  F-format  is  easier  to  read. 


In  this  example  of  E-format  the  E  02  part  is  to  be  understood  as 
10  .   E  03  would  be  103  and  E-OU  would  be  10-1+.   Thus,  .376  E  03  is 
.376  x  103  or  376.  while  .129UE-01  is  .129U  x  10"1  or  .0129U.   The  sign 
following  the  E  determines  which  way  to  move  the  decimal  point;  left  for 
negative,  right  for  blank  or  positive.   The  number  following  the  sign  or 
blank  determines  how  many  places  to  move  the  decimal  point. 

E.  Punched  Output 

All  programs  which  have  output  addresses  marked  with  the  symbol 
Q.   can  punch  output  directly  by  using  the  X  output  address.   X  is  the 
abbreviation  for  cards  as  output.   C  used  as  an  abbreviation  for  an  output 
address  will  be  an  error.   Punched  output  generated  by  the  use  of  the 
X  output  address  will  be  in  E-format.   (See  section  above).   X(F)  is  not 
a  valid  form  and  will  be  an  error. 

If  punched  output  is  desired  in  a  form  other  than  E-format  or 
from  a  program  which  does  not  allow  the  X  output  address,  then  the 
researcher  must  make  a  copy  of  his  data  on  temporary  storage  and  go  to 
the  MATRIX  program  and  use  the  PUNCH  Instruction  provided  in  that 
program. 

F.  Obtaining  Additional  Input /Output  Sources 

It  happens  that  15  temporary  storage  locations  may  not  be 
enough.   Additional  temporary  storage  may  be  obtained  by  calling 
for  Sl6  through  S^O  .   Use  of  Sl6  through  SijO  requires  the  addition  of  Job 
Control  Cards  to  the  360  system  cards  of  the  SOUPAC  program"  deck.   At 
least  the  first  time  the  researcher  should  check  with  SOUPAC  consultants 
before  doing  this;  firstly  to  learn  to  do  it  correctly  if  he  doesn't  know  how 
already,  and  secondly,  if  he  knows  how,  to  make  sure  none  of  the  Job  Control 
Language  has  been  changed  or  modified,  which  can  happen  due  to  360  system 
changes  or  reconfigurations,  or  SOUPAC  system  changes,  which  may  not  be 
announced  in  contrast  to  SOUPAC  program  changes  which  would  have  been 
announced. 

If  in  special  instances  even  ^0  temporary  storage  regions  are  not 
sufficient  or  a  situation  arises  where  so-called  DISK  temporary  storage  is 
required,  there  can  be  made  available  temporary  storage  regions  called 
DISK1  through  DISKU0.   Check  with  the  SOUPAC  consultants  before  using 
these  for  the  proper  Job  Control  Cards  and  the  proper  SOUPAC  prolog  cards. 

G.  Using  Owner  Data  Sources  or  Special  Input/Output  Requirements 
in  the  SOUPAC  system 

Users'  own  tapes  or  disk  packs  can  be  used  with  the  SOUPAC 
system  for  input  or  output.   See  Appendix  A  for  additional  information. 


Special  input/output  requirements  can  usually  be  handled  in  the 
SOUPAC  system  provided  the  requirements  can  be  handled  by  the  360  system 
at  all.   In  such  cases  check  with  the  SOUPAC  consultants. 

General  problem  types  of  the  nature  alluded  to  above  -would  be 
multiple  file  volumes,  blocked  input/output,  formatted  or  unformatted 
input /output,  different  kinds  of  record  lengths  and  different  forms  of  data 
representation  due  to  machine  differences  or  differences  in  facilities 
at  other  computer  installations. 


AUTOCORRELATION  AND  SPECTRAL  ANALYSIS 


I .   General  Description 

The  calculation  of  autocorrelation  coefficients  and  the  determination 
of  power  spectra  are  of  interest  to  economists  in  the  study  of  time  series 
and  others  whose  interest  leads  them  to  suspect  some  repetition  of  variation 
within  a  set  of  observations.   For  a  single  variable,  the  autocorrelation, 
rp,  is  calculated  as  follows: 

(N-p)  2X.X.    -  IX. IX. 
(!)   r i  1+P     i  i+P 

[(N-p)  IX.?  -  (».)2]1/2  [(N-p)  IX.   2  -  IX.  )2}1/2 
1       l  i+P      i+P 

N  is  the  total  number  of  observations;  p  is  an  arbitrarily  chosen  time  lag. 
The  power  spectrum  is  a  Fourier  transformation  of  the  autocovariances  and 
is  used  in  the  harmonic  analysis  of  Xj_  as  a  function  of  time.   Raw  estimates 
of  the  spectral  density  are  given  by  formula  (2)  and  a  smoothed  value  is 
given  by  formula  (3) : 

qp« 


(2)      L^  =  VJ     +   2ZWn   cos  ^2L  +  Wm  cos  pn 
p  o  q  rn  m 

(3^      U  =  0.23  L     n    +  O.^k  L     +  0.23  L     , 
P  P-l  P  P+l 


Up  is  the  covariance  of  X.  with  X-   . 


This  program,  however,  not  only  will  calculate  values  of  Xj_  with  X.   , 
but  will  calculate  values  for  all  possible  pairs  of  variates,  Xj^  with 


-P,j- 

For  a  more  detailed  discussion,  see: 

R.  W.  Southworth,  "Autocorrelation  and  Spectral  Analysis" 
from  Mathematical  Methods  for  Digital  Computers,  by  Anthony 
Ralston  and  Herbert  Wilf;  John  Wiley  and  Sons,  i960,  pp.  213-20. 

II.   Restrictions 

This  program  is  designed  to  accept  from  1  to  15  variables.   The  number 
of  observations  is  limited  to  2500  or  less.   Since  all  possible  pairs  of 
lagged  cross-correlations  are  printed  for  values  of  p  from  p  to  p  + kd  (p, 

.  and  d  are  set  by  program  parameters),  the  output  under  maximum  circumstances 
will  be  very  large.   The  value  of  p  is  limited  to  1500  or  less.   Current 
literature  suggests  that  for  a  correlation  between  X^  and  Xj_+p,  the  value  of 
p  should  not  exceed  10  per  cent  of  the  total  number  of  observations.   It 
should  be  pointed  out,  however,  that  a  set  of  observations,  Xj_,  can  be 
broken  up  into  blocks  of  equal  size,  and  the  several  blocks  can  be  treated  as 
additional  variables.   In  this  way,  the  value  of  p  can  exceed  1500  and  also 
the  value  of  N  can  exceed  2500. 

Ill .   Parameters 

Data  may  be  read  from  the  program  either  from  cards  or  from  any  temp- 
orary storage  medium.   In  the  output,  X  is  the  lead  variable  and  Y  is  the 
lag  variable. 


AUTOCORRELATIONS 
Page  2 

The  call  card  will  have  the  program  name  first.   After  this  the  para- 
meters are  in  the  following  order : 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  An  integer  number  denoting  the  minimum  number 

of  lag  periods,  p. 

3  An  integer  number  denoting  the  maximum  lag, 
p  +  kd,  at  which  time  the  execution  of  the 
program  is  terminated. 

k  An  integer  number  denoting  the  increment,  d, 

to  be  added  to  p,  so  that  the  lag  period  can 
be  altered. 

Parameters  3  and  k   will  be  useful  generally  when  one  wants  to  study  the 
changes  in  the  power  spectra  as  a  function  of  lag  length. 

5  The  presence  of  a  number  greater  than  0 

indicates  that  the  means,  standard  deviations, 
and  correlations  are  to  be  printed. 

6  Standard  SOUPAC  output  parameters  consisting  of 

an  output  address,  and/or  /  PRINT,  and/or /X. 

If  the  punch  or  print  option  is  chosen,  then 
the  quantities  output  are  the  lag  period,  auto- 
correlation coefficient,  autocovariance  function, 
raw  spectrum,  and  smoothed  spectrum  in  15,  UE18.8 
format . 

7  A  number  greater  than  0  indicates  that  the  lead- 
lag  sums  and  cross-products  are  to  be  printed. 

If  a  parameter  is  left  blank,  this  is  the  same  as  specifying  a  zero.   It 
should  be  pointed  out  that  the  output  for  several  variables  is  quite  large, 
and  unless  there  is  interest  in  the  output,  several  of  the  parameters  should 
be  left  as  blanks. 
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This  version  of  BALANOVA  5  is  unchanged  as  far  as  computations  are 
concerned  from  the  original  program  written  by  Paul  Herzberg  in  August , 
1966.   See  his  write-up  for  explanation  and  references  for  the  calculations. 

Each  observation  (row  of  data)  input  to  this  program  must  be  identi- 
fied by  a  number  for  each  factor  including  the  replication  factor.   These 
numbers  (which  cannot  be  punched  in  I  format)  represent  the  levels  of  the 
corresponding  factors  and  must  precede  the  dependent  variables.   In  the 
output  produced  by  the  program,  each  factor  is  given  a  unique  letter  name, 
beginning  with  A.   Thus  the  first  column  of  the  input  data  corresponds  to 
the  levels  of  factor  A  which  is  described  on  the  first  factor  specification 
card  (see  below).   Each  additional  factor  is  given  the  next  letter  in  the 
alphabet,  and  a  corresponding  factor  specification  card.   The  dependent 
variables  follow  the  factor  levels  on  the  input  data,  and  they  are  numbered 
one  through  the  total  number  of  dependent  variables,  in  the  output  of  the 
program. 

On  the  program  call  card,  the  following  parameters  follow  the  program 
name,  BALANOVA  5;  with  the  first  four  parameters  being  required. 


Parameter 
Number  Use  or  Meaning 

1  Input  Address.   SEQUENTIAL  1-15  OR  CARDS. 

2  Number  of  factors  counting  replication 

factor  if  there  is  one.   Maximum  =  10. 

3  Number  of  dependent  variables. 

Maximum  =  200. 

st 
k  Number  of  levels  of  the  1   factor. 

5-13  Number  of  levels  of  the  2nd  -  lO1   factors. 

lU  1  if  desire  unweighted  means  analysis  even  though 

have  proportional  cell  frequencies. 

Following  the  program  card  is  a  separate  subparameter  card  (factor  speci- 
fication card)  for  each  factor  in  the  order  in  which  the  factors  appear  in 
the  input  data.   Each  card  has  the  following  parameters. 

Parameter 


Number  Use  or  Meaning 

1  0  if  fixed  factor 

1  if  random  factor 


0  if  not  the  replication  factor 

1  if  is  the  replication  factor 
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Parameter 
Number  Use  or  Meaning 

3-11  factors  in  which  this  factor  is  nested 

As  in  other  SOUPAC  programs,  parameters  at  the  end  of  the  card  which  are 
not  used  may  be  deleted  and  the  period  appear  after  the  last  non-zero  para- 
meter.  The  factor  specification  cards  must  be  followed  by  an  END  PROGRAM 
card. 
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Chapter  1.   General  Description 

1.1  Introduction 

BALANOVA  5  is  a  general  analysis  of  variance  program  applicable  to  a 
wide-range  of  balanced  designs.   In  the  case  of  designs  with  a  replication 
factor,  BALANOVA  5  allows  inequality  in  the  number  of  replications  in  each 
cell.   If  the  number  of  replications  is  equal  or  proportional,  the  analysis 
is  handled  by  least  squares  (weighted  means).   If  the  number  of  replications 
is  not  proportional  then  an  unweighted  means  analysis  is  performed.   This 
is  an  approximation  to  the  least  squares  solution. 

BALANOVA  5  accepts  some  designs  that  are  not  completely  crossed,  namely 
those  nested  designs  in  which  all  main  factors  are  balanced.   Hence  hierarchical 
designs  are  allowed.   As  well,  repeated  measures  designs  are  allowed.   In  these 
designs  the  replication  factor  is  not  nested  in  all  the  other  factors. 

The  design  model  may  be  fixed-effects,  random- effects  or  mixed.   BALANOVA  5 
automatically  determines  all  the  legal  sources  of  variation  (main  effects  and 
interactions ^  and  determines  the  correct  denominator  mean  square  for  those 
sources  which  can  be  tested  by  F  test.   In  order  to  do  this,  BALANOVA  5  first 
generates  the  expected  mean  square  table  which  is  printed  in  readable  form. 
The  method  used  closely  follows  Scheffe  (1959),  Chapter  8. 

BALANOVA  5  will  accept  most  of  the  designs  described  in  Winer  (1962), 
Chapters  3,  h,    5,  6,  and  7  and  Lindquist  (1953),  Chapters  3,  5,  6,  6,  8,  9, 
10,  and  13  (Types  I,  III,  VI).      Chapter  2  and  the  Appendix  of  this  manual 
contain  a  large  number  of  examples  drawn  from  these  two  books. 

The  author  was  somewhat  reluctant  to  develop  BALANOVA  5  and  is  hesitant 
to  encourage  its  wide  use  for  the  following  reasons: 

1.  A  general  program  such  as  BALANOVA  5  encourages  the  use 
of  statistics  in  a  "cook-book"  manner.  Data  is  generated 
to  fit  the  input  specifications  of  the  program  with  no 
consideration  given  to  the  theory  of  analysis  of  variance. 
The  experimenter  who  uses  a  computer  program  in  this  way 
often  neglects  to  consider  whether  the  statistical  test 

is  appropriate  for  the  work  he  is  interested  in  and  whether 
the  assumptions  needed  for  the  test  are  satisfied  in  the 
particular  experiment  he  has  used. 

2.  In  the  author's  experience,  the  results  printed  by  such  a 
program  as  BALANOVA  5  have  a  certain  finality  which  encourages 
the  user  to  accept  the  results  as  gospel  truth.  When  used 

in  this  way,  the  experimenter  forgets  that  there  is  a  real 
possibility  of  programming  or  machine  error. 

3-  In  the  particular  case  of  analysis  of  variance,  the  idea 
has  become  widespread  that  the  summary  table  of  F  ratios 
is  the  most  important  part  of  the  analysis.  This  is  not 
the  case.   The  most  important  part  of  analysis  of  variance 
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is  the  estimation  of  the  main  effects  and  the 
interactions.   Only  by  looking  at  their  size  can 
the  experimenter  evaluate  what  is  happening  in 
his  experiment.   In  order  to. encourage  this  use 
of  analysis  of  variance,  BALANOVA  5  prints  a  table 
of  marginal  means  which  allows  easy  calculation  of 
all  the  effects  in  the  experiment.   The  F  table  is 
only  a  set  of  warning  signals.   A  non- significant 
F  indicates  that  the  corresponding  differences 
between  effects  can  be  attributed  to  sheer  chance. 

h.      BALANOVA  5  performs  an  unweighted-means  analysis  when 
the  replication  numbers  are  non-proportional.   The 
author  fears  that  this  option  will  be  used  too  often  and 
without  consideration  of  its  dangers.   The  unweighted- 
means  solution  is  often  not  satisfactory  and  references 
on  analysis  of  variance  should  be  consulted.   ( Scheff e, 
1959,  Winer,  1962,  Lindquist,  1953). 

BALANOVA  5  was  designed  to  reduce  the  great  amount  of  hand  computation 
needed  in  analysis  of  variance  calculations.   It  was  not  intended  to  eliminate 
the  necessity  of  the  user  being  familiar  with  the  theory  of  analysis  of  vari- 
ance.  It  is  hoped  that  the  above  comments  will  discourage  some  indiscriminant 
use  of  BALANOVA  5- 

1.2  Special  Features 

The  output  from  BALANOVA  5  consists  of 

1.  A  table  of  the  expected  mean  squares  in  readable 
•  form. 

2.  The  number  of  replications  in  each  cell  in  the 
case  of  designs  with  a  replication  factor. 

3.  The  table  of  marginal  means.  All  means  entering 

in  the  computation  of  the  sum  of  squares  are  printed. 

h.      The  analysis  of  variance  summary  table  including, 
for  each  source  of  variation,  the  sum  of  squares 
and  mean  square,  and  for  each  source  with  denominator, 
the  F  ratio  and  the  probability  of  the  chance  occurance 
of  the  F  ratio. 

A  feature  of  BALANOVA  5  is  its  flexible  specification  of  analysis  of 
variance  designs,  allowing  a  wide  range  of  designs  to  be  described  by  a 
common  code. 

A  large  number  of  checks  are  made  by  BALANOVA  5  to  ensure  that  the  design 
is  legal  and  that  the  data  correspond  to  the  design.   Diagnostics  are  printed 
to  indicate  all  error  conditions. 
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1. -   Legal  Design  of  BALANOVA  5 

Consider  the  following  definitions,  taken  from  Scheffe  (1959) •   Let 
there  be  p  factors  in  a  design,  not  counting  the  replication  factor,  if 
there  is  one.   A  cell  is  specified  by  a  set  of  p  levels,  one  for  each 
factor.   The  layout  of  design  is  complete  if  there  is  at  least  one  ob- 
servation in  every  cell.   The  factors  in  such  a  design  are  completely 
crossed.   If  the  design  is  complete  and  there  is  a  replication  factor 
(i.e.  all  cells  have  at  least  one  observation  and  at  least  one  cell  has 
more  than  one  observation"1  then  the  design  is  considered  to  be  a  Class  A 
design  in  BALANOVA  5- 

There  are  many  analysis  of  variance  designs  which  are  not  complete 
in  the  above  sense.   Examples  of  incomplete  designs  are  Latin-square, 
incomplete  blocks  and  nested  designs.   The  only  incomplete  designs 
which  are  allowed  in  BALANOVA  5  are  nested  designs  which  are  balanced  in 
all  factors  except  for  the  replication  factor  (which  need  not  be  balanced) . 
These  incomplete  designs  are  called  Class  B  and  C  designs.   "Nesting" 
"balanced"  and  "replication  factor"  are  defined  in  the  nest  three  para- 
graphs.  These  definitions  are  illustrated  in  Chapter  2. 

Nesting  may  be  defined  as  follows :  The  levels  of  a  factor  C  are 
nested  within  the  levels  of  a  factor  A  (in  short,  C  is  nested  within  A) 
if  and  only  if  each  level  of  C  appears  with  only  a  single  level  of  A  in 
the  observations.  Note  that  if  C  is  not  nested  within  A,  it  is  crossed 
with  A,  but  only  if  every  level  of  C  appears  with  every  level  of  A  is 
!  completely  crossed  with  A.  Latin-square  and  incomplete  block  designs 
are  only  partly  crossed. 

A  nested  factor  C  is  balanced  if  the  number  of  levels  of  C  is  the  same 
within  each  combination  of  those  factors  within  which  C  is  nested  and  the 
factors  (if  any^i  which  are  crossed  with  C  are  completely  crossed. 

A  replication  factor,  in  BALANOVA  5,  is  a  factor  which  is  nested  within 
one  or  more  other  factors,  but  not  necessarily  within  all  other  factors. 
Furthermore,  no  factor  may  be  nested  within  the  replication  factor.   That  is, 
a  factor  is  a  replication  factor  if  and  only  if  for  every  other  factor  A  in 
the  design,  it  is  either  nested  within  A  or  crossed  with  A.   A  replication 
factor  may  be  nested  within  some  factors  and  crossed  with  others.   There  can 
be  at  most  one  replication  factor  in  a  design. 

The  distinction  is  made  between  replication  factors  and  other  nested 
factors  in  BALANOVA  5  since  replication  factors  do  not  have  to  be  balanced. 
All  other  factors  must  be  balanced. 

Using  these  definitions,  the  following  designs  are  legal  in  BALANOVA  5- 

Class  A  designs  (completely  crossed  with  nested  replications) 

Class  A  designs  contain  (p  +  1)  factors  of  which  p  are  the  main  factors 
and  the  other  factor  is  the  replication  factor.   The  following  two  conditions 
must  both  be  met  for  the  design  to  be  Class  A. 
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(a)  All  p  main  factors  are  completely  crossed. 

(b)  The  replication  factor  is  nested  in  all  main  factors. 
Thus  one-way  and  factorial  designs  are  Class  A  designs. 

Class  B  designs  (other  replication  designs) 

Class  B  designs  also  contain  (p  + l)  factors  of  which  p  are  the  main 
factors  and  the  other  factor  is  the  replication  factor.   However  one  or 
both  of  the  two  conditions,  (a)  and  (b),  are  not  satisfied  in  Class  B 
designs. 

When  (a)  is  not  satisfied,  that  is,  the  p  main  factors  are  not 
completely  crossed,  then  the  main  factors  must  satisfy  the  following 
condition. 

(a')   Consider  any  two  main  factors,  A  and  B.   Either  A  is  completely 
crossed  with  B,  or  A  is  nested  within  B  or  B  is  nested  within 
A.   This  must  be  true  for  all  pairs  of  main  factors.   Further- 
more, at  least  one  pair  must  have  the  nested  relationship  or 
else  (a^  would  be  satisfied. 

When  (b)  is  not  satisfied,  then  the  following  condition  must  be  true. 

(b')   The  replication  factor  is  nested  in  at  least  one  but  not  all 
main  factors.   Note  that  the  requirement  that  the  replication 
factor  be  nested  in  at  least  one  factor  is  part  of  the  basic 
definition  of  a  replication  factor. 

Class  B  designs  then  can  be  of  the  following  two  types. 

Hierarchical  designs:  (a')  and  (b)  are  satisfied.  The  replication 
factor  is  nested  in  all  factors  but  there  is  some  nesting  among  the  main 
factors. 

Repeated  measures  designs:   (b')  is  satisfied.   Either  (a)  or  (af)  can 
be  satisfied.   The  necessary  feature  (b')  of  repeated  measures  designs  is 
that  the  replication  factor  is  crossed  with  one  or  more  of  the  main  factors. 
The  factors  in  which  the  replication  factor  is  nested  may  themselves  be  either 
crossed  (a)  or  nested  (a'). 

Class  C  designs  (no  replication  factor) 

Class  C  designs  have  p  factors  and  there  is  no  replication  factor. 
All  factors  must  be  balanced.   For  each  pair  of  factors,  e.g.  factors  A 
and  B,  either  A  is  completely  crossed  with  B,  or  A  is  nested  within  B  or 
B  nested  within  A.   There  does  not  necessarily  have  to  be  any  nesting 
at  all. 
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In  summary,  then,  designs  are  classed  in  the  following  way  in  BALANOVA 
5-   Class  A  and  B  designs  have  a  replication  factor,  Class  C  designs  do  not. 
Class  A  designs  are  distinquished  from  Class  B  designs  in  that  a  Class  A 
must  have  l)   all  main  factors  completely  crossed  and  2)   the  replication 
factor  nested  in  all  main  factors.   Class  B  designs  violate  one  or  both 
these  requirements. 

In  Class  A  and  B  designs,  the  replication  factor  does  not  need  to  be 
balanced.  However  all  nested  factors,  except  the  replication  factor  (if 
any1),  must  be  balanced.   Recall  that  in  Class  A  designs,  the  replication 
factor  is  the  only  nested  factor. 

As  explained  above,  the  replication  factor  is  distinquished  from 
other  nested  factors  since  it  does  not  have  to  be  balanced.   There  are 
two  other  reasons  for  distinguishing  the  replication  factor  from  other 
nested  factors.   These  reasons  are  important  even  if  the  replication 
factor  is  balanced. 

1.  In  Class  A  designs  (completely  crossed  with  replications)  only 
cell  means  are  stored  in  the  computer  and  thus  very  large  designs 
can  be  accommodated.   The  allowable  number  of  replications  in 
each  cell  is  virtually  unlimited. 

2.  For  all  replication  designs,  whether  of  Class  A  or  B,  the  level 
number  for  the  replication  factor  in  each  nest  does  not  need 

to  run  from  one  up  to  the  maximum  number  of  levels  in  each 
nest  as  it  does  for  all  other  factors.   Any  convenient  numbering 
of  the  replications  may  be  used  (e.g.  a  unique  number  for  every 
subject  in  the  experiment,  regardless  of  the  nest  within  which 
he  is^ .   This  feature  of  BALANOVA  5  is  especially  useful  when 
several  dependent  variables  are  analyzed  and  there  is  missing 
data  for  some  of  the  subjects  for  some  of  the  dependent  variables. 

l.U  Calculations  for  equal  and  unequal  number  of  replications 

The  calculations  performed  by  BALANOVA  5  for  designs  with  a  replication 
factor  (Class  A  and  B  designs)  depend  on  whether  the  number  of  replications 
in  each  cell  are  equal  or  unequal.   If  the  numbers  are  equal,  the  standard 
analysis  of  variance  calculation  is  made  (least -squares  or  weighted  means 
analysis').   If  the  numbers  are  unequal,  a  check  is  first  made  to  see  if  the 
cell  N's  are  proportional.   In  a  two-way  analysis  of  variance,  for  example, 
the  cell  N's  are  proportional  if  the  number  of  replications  in  the  ij  cell, 
Nji,  satisfies 

NiT  X  NTj 
13  NTT 

where  the  T's  indicate  marginal  totals.   If  the  cell  N's  are  proportional, 
BALANOVA  5  makes  the  least- squares  calculations,  i.e.  weighted  means  are 
used.   If  the  cell  N's  are  not  proportional,  then  the  method  of  unweighted 
rr.eans  is  used  (See  Scheffe,  pp.  262-3  or  Winer,  pp.  222-U) . 
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In  general,  if  i,  j,  k, 


X    axe   those  factors  within  which  the 


replication  factor  is  nested  (not  necessarily  all  the  factors  in  the  design ), 
and  if  N^^^  ^   is  the  number  of  replications  in  a  particular  nested  cell, 


then  the  cell  N's  are  proportional,  if,  for  all  combinations  ijk 


/-, 


N. 


N. 

iTT, 


ijk. . . .X 


■  TXNTjT. 


x  N 


TTT X 


(N       ^ 
v  TTT. . .T' 


q-1 


In  this  formula,  the  T's  indicate  marginal  totals  and  q  is  the  number  of 
factors  within  which  the  replication  factor  is  nested.   In  particular,  the 
one-way  analysis  with  unequal  N's  is  a  proportional  design  (i.e.  the  cell 
N's  are  proportional)  by  this  definition,  since 


Ni  = 


N. 

l 


N. 


=  N. 


(V 


In  fact  any  design  in  which  the  replication  factor  is  nested  in  only  one 
factor  is  a  proportional  design. 

1.5   Specification  of  a  Design 

Any  design  is  described  by  listing  the  following  information  about  each 
factor  in  the  design,  including  the  replication  factor  if  there  is  one.   The 
information  for  each  factor  is  punched  on  a  separate  card  (a  factor  specifi- 
cation card',  and  the  cards  should  be  in  the  same  order  as  the  factors  are 
in  the  input  data.   Each  parameter  should  be  enclosed  in  parentheses,  and 
each  card  terminated  by  a  period. 


Parameter  1 


Type  of  factor.   The  first  parameter  on  each 
factor  specification  card  should  be  a  zero 
if  the  factor  is  fixed,  and  a  one  if  it  is 
random.   The  replication  factor  is  always  a 
random  factor.   At  least  one  factor  in  every 
design  must  be  random'. 


Parameter  2 


Replication  Factor.   If  the  design  has  a 
replication  factor,  this  is  indicated  by 
punching  a  one  for  the  second  parameter. 
A  design  may  have  only  one  replication  factor. 
If  there  is  no  replication  factor,  the  second 
parameter  should  be  zero  (or  blank)  on  all 
of  the  factor  specification  cards. 


Parameter  3-11 


Nesting.   The  factors  in  which  the  given  factor 
(the  one  to  which  this  card  refers")  is  nested 
are  listed.   Factors  are  numbered  from  one 
through  the  number  of  factors  in  the  design. 
If  the  factor  is  not  nested,  parameters  3-H 
may  be  completely  omitted. 
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An  example  of  this  way  of  specifying  a  design  will  be  now  given. 
Consider  a  two-way  analysis  of  variance  with  subjects  within  cells.   The 
design  is  considered  to  have  three  (not  two)  factors,  namely  A  and  B, 
the  main  factors,  and  C,  the  replication  factor.   Suppose  there  are  3 
levels  of  A  and  k   levels  of  B  and  that  each  cell  has  10  subjects.   The 
cards  used  to  perform  this  analysis  are  listed  below.   Each  line  corresponds 
to  one  IBM  card. 

TRANSFORMATIONS (CARDS) (SEQUENTIAL  l) (k) . 

END  PROGRAM 

BALAN0VA( SEQUENTIAL  1> (3^ (l) (3) (*0  (10) . 

(0)(0). 

(o)(o). 

(1)(1)(1)(2). 

END  PROGRAM 

The  first  card  listed  above  calls  the  TRANSFORMATION  program  and  uses 
it  to  store  raw  data  from  cards  onto  sequential  file  number  1  (SEQUENTIAL  l) . 
The  number  four  is  the  total  number  of  variables,  independent  (3)  +  dependent 
(l) .   Since  no  transformations  are  performed,  the  END  PROGRAM  card  immediately 
follows  the  main  program  card. 

The  third  card  listed  above  calls  the  BALANOVA  program.   The  first  para- 
meter is  the  location  of  the  data  (SEQUENTIAL  1',  the  second  is  the  number 
of  factors  (3^  the  third  is  the  number  of  dependent  variables  (1),  the  fourth 
is  the  number  of  levels  of  the  first  factor  (3\  followed  by  the  number  of 
levels  of  the  second  factor  (k) ,  and  finally  the  number  of  levels  of  the 
last  factor,  which  is  the  maximum  cell  size  (10")  when  considering  the  repli- 
cation factor. 

The  fourth  card  is  the  factor  specification  card  for  factor  1.   The  first 
2  parameters  are  zero,  labeling  this  factor  as  fixed,  and  as  not  being  the 
replication  factor.   The  fifth  card  is  the  factor  specification  card  for  factor 
2  which  is  also  fixed,  and  not  the  replication  factor.   Note  that  parameters 
3-11  are  blank,  as  factors  1  and  2  are  not  nested  in  any  other  factors. 

The  sixth  card  is  the  factor  specification  card  for  factor  3-   Its  four 
parameters  denote  it  as  a  random  factor,  as  the  replication  factor,  and  as 
nested  in  factors  1  and  2.   The  seventh  card  terminates  the  BALANOVA  program. 
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Chapter  2.   Design  Examples 
OMITTED  TEMPORARILY 
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Chapter  3-   Preparation  of  Input 

3-1  Introduction 

The  following  rules  apply  to  the  assignment  of  factor  levels  in  all 
types  of  designs.   There  is  usually  no  need  to  rekeypunch  existing  data 
however,  as  it  is  almost  always  possible  to  create  the  factor  levels  in 
the  TRANSFORMATIONS  program.   If  you  need  help  using  this  program,  see  a 
SOUPAC  consultant. 

(a'l   Non-replication  Factors  in  Class  A,  B  and  C  designs 

The  levels  for  non-replication  factors  must  run  from  one  (l)  con- 
secutively up  to  the  number  of  levels  given  on  the  BALANOVA  call  card. 
E.g.,  if  a  factor  represents  four  treatment  groups,  these  groups  must  be 
numbered  1,  2,  3>  and  k   and  each  subject's  row  or  rows  in  the  data  matrix 
must  have  a  1,  2,  3>  or  ^  punched  to  indicate  the  group  he  is  in.   If  the 
factor  is  nested,  the  level  numbers  must  run  from  one  (l)  up,  in  each 
cell  of  the  nest.   See  the  example  given  in  Section  3-2. 

(b)   Replication  Factors  in  Class  A  design 

The  replication  numbers  (level  numbers'!  can  be  anything,  for  example, 
a  subject  identification  number.   The  subject  numbers  do  not  have  to  be 
unique  either  in  a  group  or  between  groups.   In  fact,  to  tell  the  truth, 
in  Class  A  designs,  the  replication  level  is  not  used  but  it  must  never- 
theless appear,  even  if  it  is  a  dummy.   This  statement  does  not  apply  to 
other  design  classes. 

Replication  Factors  in  Class  B  designs,  of  repeated  measures  type 

Special  care  must  be  taken  with  the  replication  levels  in  those  designs. 
Let  us  divide  the  non-replication  factors  into  two  groups: 

a-set:   those  factors  in  which  the  replication  factor  is  nested. 

3- set:   all  other  factors  -  i.e.  those  factors  crossed  with  the 
replication  factor. 

If  the  3-set  is  empty,  the  design  is  of  hierarchical  rather  than  repeated 
measures  type.   See  paragraph  (d)  below. 

Let  us  denote  by  an  a-cell  a  particular  set  of  levels  of  the  factors  in 
the  a-set.   The  replications  in  this  cell  may  be  any  values  (not  necessarily 
from  one  (1)  up)  but  must  be  distinct.   Again,  the  numberings  in  two  different 
a-cells  do  not  have  to  be  distinct,  but  can  be.   In  other  words,  if  the 
replication  factor  is  subjects,  an  identification  number  may  be  used  as  the 
replication  level.  Now  each  subject  appears  in  more  than  one  row  (card)  of 
the  data  rr.atrix  since  each  subject  appears  with  every  combination  of  levels 
of  the  factors  in  the  3-set.   Now  it  should  be  obvious  that  every  row  that 
refers  to  the  same  subject  has  to  have  the  same  replication  number.   This  is 
the  only  way  that  BALANOVA  5  can  tell  that  two  different  rows  refer  to  the 
sarr.e  subject. 
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(d)   Replication  Factors  in  Class  B  designs,  of  hierarchical  type 

The  subject  numbers  must  all  be  different  within  any  one  cell  (one 
level  set  of  the  non-replication  factors)  but  may  be  the  same  over 
different  cells. 

Special  note  on  missing  data 

If  a  dependent  variable' field  on  a  card  is  totally  blank,  BALANOVA  5 
does  not  include  the  score  in  the  analysis  for  the  given  dependent  variable. 
However  other  non-blank  dependent  variable  fields  on  the  same  card  will  be 
included  in  their  respective  analyses. 

Do  not  confuse  this  deletion  of  missing  data  with  an  error  comment  by 
BALANOVA  5  to  the  effect  that  there  is  no  data  cell  A  =  1,  B  =  2.   This 
comment  means  that  no  data  card  with  A  =  1,  B  =  2  had  non-blank  data  for 
the  given  dependent  variable. 

3.2  Data  matrix  examples 

Class  A  design 

In  all  the  following  examples,  it  is  assumed  data  is  stored  on 
sequential  file  number  1  (SEQUENTIAL  1) . 

Consider  a  two-way  design  with  three  subjects  in  each  cell.   For  the 
purpose  of  BALANOVA,  subjects  are  also  considered  to  be  a  factor,  the 
replication  factor.   Suppose  that  there  are  two  dependent  variables,  and 
further  that  the  factor  specification  cards  are  listed  in  the  order  given 
below,  following  the  main  program  card. 

BALANOVA (SEQUENTIAL  V  (3) (2) (2) (3) (3) . 
(0)(0). 

(D(iHiK3>. 
(o)(o). 

END  PROGRAM 

Note  that,  contrary  to  the  usual  case,  the  replication  is  the  second  factor. 
This  illustrates  one  flexible  feature  of  BALANOVA  5.   A  data  matrix  could  be 


1 

1 

l 

20 

19 

1 

2 

l 

8 

8 

1 

3 

1 

h 

k 

1 

k 

2 

-3 

6 

1 

5 

2 

k 

10 

1 

6 

2 

2 

3 

1 

7 

3 

k 

k 

1 

8 

3 

6 

2 

1 

9 

3 

8 

k 

2 

10 

1 

2 

7 

2 

11 

1 

-1+ 

8 
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12       1       25       2 


2 

13 

2 

126 

15 

2 

Ik 

2 

2 

20 

2 

15 

2 

3 

3 

2 

16 

3 

k 

k 

2 

17 

3 

5 

-1 

2 

18 

3 

3 

2 

Note  that  the  first  column  is  the  A  level,  the  third  column  is  the  C  level 
and  the  second  column  is  the  replication  level,  which  in  Class  A  designs  can 
be  anything.   The  last  two  columns  are  the  dependent  variables.   Each  row 
of  the  data  matrix  would  be  punched  on  one  or  more  cards.   A  possible  format 
would  be  (3F5.0,3X,2F6.0). 

The  order  of  the  rows  is  immaterial.   They  could  be  in  any  order  and 
have  been  written  in  a  systematic  order  only  for  convenience. 

Class  B  design  -  repeated  measures 

The  data  in  Winer,  Table  7-2-3  could  be  analyzed  with  the  following  pro- 
gram, using  three  factors  and  one  dependent  variable. 

BALANOVA (SEQUENTIAL  1) (3) (l^ (2) (k) (3) • 

(0)(0). 

(0)(0). 

(iMDd  • 

END  PROGRAM 


Data  Card 

s : 

1 

1 

1 

0 

1 

2 

1 

0 

1 

3 

1 

5 

1 

k 

1 

3 

1 

1 

2 

3 

1 

2 

2 

1 

1 

3 

2 

5 

1 

k 

2 

k 

etc. 

2 

1 

5 

5 

2 

2 

5 

k 

2 

3 

5 

6 

2 

k 

5 

6 

2 

1 

6 

7 

2 

2 

6 

5 

2 

3 

6 

8 

2 

k 

6 

9 

Again,  the  rows  could  be  any  order.   The  subjects  in  the  second  level  of  factor 
A  could  be  assigned  level  numbers  1,  2,  3  or  any  other  thre  distinct  numbers. 
A  possible  format  for  this  matrix  is  (2F5.0,F6.0,1X,F7.0) . 

Another  repeated  measure  design  is  Winer,  Table  7-^-3-  Here  we  have 
four  factors  and  one  dependent  variable. 
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BALANOVA (SEQUENTIAL  1) (k) (l) (2) (2) (3) (k) 
(0)(0). 

(o)(o). 

(1)(1)(1)(2). 
(0)(0). 

END  PROGRAM 


a  Cards : 

1 

1 

1 

1 

18 

1 

1 

1 

2 

lit 

1 

1 

1 

3 

12 

1 

1 

1 

k 

6 

1 

1 

2 

1 

19 

1 

1 

2 

2 

12 

1 

1 

2 

3 

8 

1 

1 

2 

4 

4 

etc. 

1 

2 

6 

1 

18 

1 

2 

6 

2 

10 

1 

2 

6 

3 

5 

1 

2 

6 

It 

1 

2 

1 

7 

1 

16 

2 

1 

7 

2 

10 

2 

1 

7 

3 

8 

2 

1 

7 

l+ 

it 

etc. 

2 

2 

12 

1 

16 

2 

2 

12 

2 

12 

2 

2 

12 

3 

8 

2 

2 

12 

h 

8 

The  assignment  of  levels  to  factors  A,  B,  D  (columns  1,  2,  h)   has  to  be  as 
shown  above,  but  the  subjects  numbers  can  be  changed  provided  that,  for  each 
(A,B)  cell,  no  two  subjects  have  the  same  number.   A  possible  format  is 
(2F5-0,  1X,2F5.0,F12.0) . 

Class  B  design  -  hierarchical 

Consider  the  following  design  with  three  factors  and  one  dependent  vari- 
able. 

BALANOVA  (SEQUENTIAL!)  (3^(1^  (2)  (2)  (It)  . 
(0)(0). 

(o)(o)(:n. 

(1)(1)(1)(2). 

END  PROGRAM 

This  is  a  hospitals  (factor  2)  within  drugs  (factor  1)  design,  illustrated 
below,  v/ith  unequal  cell  size. 
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Data  Card 

s  : 

1 

1 

1 

5-0 

1 

1 

2 

If. 2 

1 

2 

1 

5-6 

1 

2 

2 

3-2 

1 

2 

3 

if. 6 

2 

1 

1 

5-3 

2 

1 

2 

8.2 

2 

1 

3 

U.3 

2 

1 

1+ 

6.3 

2 

2 

1 

5-7 

2 

2 

2 

6.8 

Note  that  the  hospitals  are  numbered  1,  2  in  each  level  of  factor  1  even  though 
there  are  four  different  hospitals  involved.   This  is  necessary  since  factor  2 
is  a  non-replication  factor  -  see  the  rules  about  Data  Cards  in  Section  3.1- 
There  are  2  patients  in  hospital  1  for  drug  1,  3  patients  in  hospital  2  for 
drug  1,  h   patients  in  hospital  1  for  drug  2  and  2  patients  in  hospital  2  for 
drug  2.   The  patient  numbering  is  flexible  -  it  could  be  a  different  number 
for  every  patient,  regardless  of  hospital.   A  possible  format  is  (3F5-0,F7-l) • 

Class  C  design 

The  example  in  Winer,  Table  If.  3-1,  could  be  set  up  as  follows,  with  two  factors 
and  one  dependent  variable. 

BALAN0VA( SEQUENTIAL  1) (2^(1) (5) (*0  • 

(D(o). 
(o)(o). 

END  PROGRAM 


Data  Card 

s  : 

1 

1 

30 

1 

3 

16 

2 

1 

li+ 

2 

2 

18 

2 

3 

10 

2 

k 

22 

1 

2 

28 

~J 

1 

2if 

3 

2 

20 

3 

18 

3 

k 

30 

If 

1 

38 

1 

h 

3^ 

k 

2 

3U 

k 

3 

20 

k 

If 

If  if 

5 

If 

30 

5 

3 

Ik 

5 

2 

28 

5 

1 

26 
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The  rows  have  been  written  in  a  non- systematic  order  to  emphasis  that, 
without  exception,  in  BALANOVA  5  the  data  rows  can  be  in  any  order.  A 
possible  format  is  (2F5-0,F10.0) . 
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Chapter  h.      Program  Details 

U.l  Method  and  program  flow 

The  program  follows  the  procedures  in  Scheffe  (1959);  Chapter  8.   Scheffe's 
discussion  will  not  be  repeated  here,  but  only  a  general  description  of  the 
program  flow  will  be  given.   The  names  of  the  subroutines  used  are  indicated 
in  case  reference  is  made  to  the  program  listing.   Many  of  the  minor  steps 
and  subroutines  are  not  described. 

1.  Main  Program 

Calls  subroutines  and  routes  your  data  through  the  program. 

2.  Design  input  and  check  (iNPUTD) 

The  Factor  Specification  Cards  are  read  and  checked  for  errors.  Many 
of  the  error  conditions  mentioned  in  Section  ^+-3  are  checked  in  INPUTD. 
The  design  is  transformed  into  the  symbolic  notation  of  live,  dead  and 
absent  subscripts  as  in  Scheffe. 

: •   Derivation  of  all  legal  sources  (LEGALS,NEWS) 

All  possible  interactions  are  generated  but  only  one  interaction  with 
a  given  set  of  subscripts  is  retained.   The  procedure  is  identical  to  Scheffe, 
p.  277,  para.  1.   The  program  now  has  a  list  of  all  legal  sources  (including 
the  original  factors) . 

k.      Expected  mean  squares  (AUXIL,EMS) 

The  expected  mean  squares  for  each  source  are,  of  course,  not  computable 
numbers,  but  rather  symbolic  expressions.   (cf.  last  column,  Table  8.2.2, 
Scheffe).   The  program  generates  and  prints  these  expression  in  a  form  very 
close  to  the  normal  printed  form.   The  method  is  from  Scheffe,  pp.  28U-8. 

5 •   Denominator  for  each  source  (FINDEN) 

By  the  standard  procedures,  using  the  expected  mean  squares,  the  program 
determines  the  correct  denominator  (if  any)  for  each  source. 

6.   Sorting  of  sources  for  summary  table  (SORT) 

The  sources  are  sorted  in  a  convenient  order,  combining  all  sources 
with  the  same  denominator.   This  order  is  then  used  in  printing  the  summary 
table . 

7-   Input  of  data  (INPUTX) 

The  input  data  is  read  from  the  input  device  (the  first  parameter  on  the 
main  program  card  .   The  grand  means  for  each  dependent  variable  are  computed, 
ignoring  missing  (blank"1  data. 
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8.   Storage  of  data  for  one  dependent  variable  (READX) 

This  routine  as  well  as  all  the  remaining  ones  are  executed  in  cycle 
once  for  each  dependent  variable.   READX  stored  the  data  in  core  and  checks 
that  no  data  is  missing  in  the  design.   The  data  is  actually  stored  as 
deviations  from  the  grand  mean.   This  is  done  to  improve  accuracy.   See 
Section  k.2. 

9-   Check  of  replication  numbers  (CELLN) 

In  the  case  of  Class  A  and  B  designs,  BALANOVA  5  checks  whether  the 
call  frequencies  are  equal,  proportional  or  non-proportional. 

10.  Computation  of  sum  of  squares  (SSEQU, SSPR0P,XMEAN) 

The  marginal  means  and  sums  of  squares  for  each  legal  source  are 
calculated. 

11.  Computation  and  printing  of  final  summary  table  (FISHER, FPRINT) 
These  calculations  are  made  in  the  standard  way. 

k.2     Some  comments  on  accuracy  of  computation 

An  attempt  was  made  in  the  design  of  this  program  to  eliminate  the 
largest  sources  of  computational  inaccuracies  that  can  occur  in  analysis  of 
variance  calculations. 

Consider  a  one-way  analysis  with  the  following  data: 

Group  1  Group  2  Group  3 

8.88  8.96 

8.90  8.99 

8.90  9-00 

8.91  9-02 
8.91  9-03 

8.90  9.00 

Sums  of  squares  computations  are  generally  made  as  the  sums  and 
differences  of  two  or  more  terms.   In  this  example,  the  exact  calculations 
would  be 

SS  between  =  1188.2500  -  1188.1500 
=  0.1000 

SS  within  -  1188.255^  -  1188.2500 
=  0.005^ 

Note  that  these  answers  each  have  a  string  of  zeros  following  the  given  digits 
ce  the;     exact.  However  on  a  computer,  with  about  8  digit  accuracy,  the 
differences  would  only  be  accurate  to  about  four  decimals  due  to  the  cancellation 
of  all  the  higher  order  digits  by  subtraction. 
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This  is  illustrated  by  a  calculation  using  the  previous  analysis  of 
variance  program  in  SOUPAC.   SOUPAC ' s  answers  were: 


SS  between  =  0.10000610 
SS  within  =  0.0053863525 


(5  significant  digits) 
(2  significant  digits) 


Note  the  large  errors  in  these  SS.   Even  worse  errors  can  occur  in  other  data. 

The  whole  problem  could  be  avoided  by  accumulating  a  true  sum  of  squares, 
that  is,  by  adding  positive  numbers  to  form  each  SS  rather  than  taking  a 
difference  of  two  large  numbers.   However  this  procedure  was  rejected  because 
it  is  extremely  slow. 


The  following  procedure  is  us 
The  data  are  internally  transformed 
is  why  the  grand  means  are  computed 
are  actually  stored  in  the  memory  in 
are  used  the  individual  terms  which 
SS  are  now  numbers  of  approximately 
means  that  the  number  of  significant 
grand  mean  is  large.   In  the  example 


ed  in  BALANOVA  5  and  it  is  very  effective, 
to  deviations  from  the  grand  mean.   This 
in  subroutine  INPUTX  before  the  deviations 

subroutine  READX.   When  the  deviations 
are  added  and  subtracted  to  give  each 
the  same  size  as  the  SS  itself.   This 
digits  in  the  SS  is  large  even  if  the 
given  above,  the  deviation  scores  are: 


Means 


Group  1 

-.13 
-.11 
-.10 
-.08 
-.08 

-.10 


Group  2 


.02 

.00 

.00 

+ 

.01 

+ 

.01 

.00 


Group  3 

+  .06 
+  .09 
+  .10 
+  .12 
+  .12 

+  .10 


and  the  SS  are  computed  as 


SS  between  =  0.10000000  -  0.00000000 
=  0.10000000 

SS  within  =  0.105^0000  -  0.10000000 
=  0.005^0000 

The  actual  results  produced  by  BALANOVA  5  were 

SS  between  =  0.099999998  (8  significant  digits) 

SS  within  -  0.0053999992         (7  significant  digits) 

Note  the  great  improvement  in  accuracy. 

As  a  final  feature  of  BALANOVA  5;  the  approximate  number  of  significant 
digits  in  each  SS  is  calculated  and  printed  alongside  each  SS.   These  numbers 
should  not  be  interpreted  exactly  but  only  as  a  warning  when  they  are  small. 
The  approximate  number  of  significant  digits  is  calculated  in  the  following  way: 
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(a)  Find  the  largest  term,  in  absolute  value,  entering  into  the 
calculation  of  the  SS.   In  the  example  above,  the  largest 
term  in  SS  within  is  the  first  term  (0.1051+0000)  . 

(b)  Take  the  ratio  of  this  largest  term  to  the  SS  itself.   In  the 
example,  this  ratio  =  0.105U0000/0. 005^0000. 

(c)  The  approximate  number  of  significant  digits  is  then  =  8.0-log, n 
(ratio).   In  the  example,  this  is  8.0-1. 38  =  6.62  which  is 
printed  by  BALANOVA  5  as  7>  a  pretty  good  estimate. 

Note  that  the  number  of  significant  digits  printed  by  BALANOVA  5  reflects 
the  loss  of  accuracy  in  the  computation  of  the  SS  from  two  or  more  terms. 
It  does  not  reflect  loss  of  accuracy  due  to  computation  of  the  terms  them- 
selves. 

k.3     Error  conditions 

BALANOVA  5  makes  a  detailed  check  to  insure  that  the  design  is  legal, 
that  none  of  the  computer  storage  arrays  are  exceeded  and  that  all  the  data 
corresponds  to  cells  within  the  specified  design.   The  following  general 
types  of  errors  are  distinguished  and  corresponding  error  messages  are 
printed  giving  detailed  instructions  about  how  to  correct  the  error. 

1.  One  of  the  restrictions  on  program  size  has  been  exceeded.   These 
restrictions  are: 

(a'i   maximum  number  of  factors  =  10 

(b^   maximum  number  of  legal  sources  =  100 

(c'l   maximum  size  of  X- storage  array  (used  for  data,  means  and 
cell  numbers)  =  10,000 

(d)  maximum  number  of  dependent  variables  =  200 

(e)  maximum  number  of  sigma-squared  terms  in  any  one  expected 
mean  square  =  10 

2.  The  factor  specification  cards  are  incorrect  or  inconsistent. 
This  is,  the  design  is  illegal.   The  checks  made  are: 

(a)  all  nested  factors  must  be  listed  as  a  factor. 

(b)  no  factor  may  be  nested  within  itself. 

(c)  at  most  one  factor  can  be  the  replication  factor. 
Furthermore,  the  replication  factor  must  be  nested 
in  at  least  one  other  factor  and  no  factor  can  be 
nested  in  the  replication  factor. 


-19-  BALANOVA  5 

(d)  the  factor  type  must  be  fixed  or  random. 

(e)  the  maximum  number  of  levels  for  each  factor  must  be 
more  than  one. 

(f)  there  must  be  at  least  one  denominator  term  in  the 
analysis  of  variance  summary  table.   If  this  is  not 

the  case  it  is  probably  due  to  no  factor  being  designated 
as  a  random  factor. 

3-  A  Data  Card  has  a  level  set  which  exceeds  the  limits  stated  in 
the  maximum  number  of  levels  on  the  Factor  Specification  Cards. 

h.      Once  the  data  for  a  dependent  variable  has  been  read  in,  a 
detailed  check  is  made  to  insure  that  all  cells  in  the  design  are  filled. 
If  one  cell  is  not,  the  calculation  for  that  design  is  deleted  and  the 
program  moves  on  to  the  next  dependent  variable  after  printing  sufficient 
information  for  the  user  to  locate  the  missing  datum.   An  additional  check 
for  Class  B  and  C  designs  is  made  to  insure  that  data  for  a  given  subscript 
set  is  not  read  in  twice.   If  two  data  cards  specify  the  same  level  set, 
a  comment  is  made  to  this  effect  and  the  calculations  for  the  dependent 
variable  are  deleted.   Note  that  both  the  checks  mentioned  in  this  paragraph 
are  made  independently  for  each  dependent  variable  and  are  made  after  the 
missing  data  (blank  fields)  for  that  dependent  variable  have  been  deleted. 
Errors  referred  to  in  this  paragraph  are  not  fatal  and  the  program  proceeds 
to  the  next  dependent  variable. 

5-   About  one  dozen  other  checks  are  made.   They  should  always  be 
passed  satisfactorily  since  the  design  is  first  checked  as  above.   These 
additional  checks  were  inserted  to  assist  in  debugging  the  program  and 
if  one  of  them  fails  it  indicates  a  remaining  error  in  the  program.   A 
printed  message  is  made  to  this  effect  in  these  cases. 

k.k     Program  Checkout 

BALANOVA  5  has  been  checked  on  a  large  number  of  designs.   Among  these, 
the  following  calculations  were  reproduced  by  BALANOVA  5: 

1.  Lindquist,  p.  266,  Class  A. 

2.  Winer,  Table  7-8-3  (p-  376"),  both  least- squares  and  unweighted 
means,  Class  B. 

U.5  Number  of  levels  of  the  replication  factor 

The  rules  in  Section  1.5  and  Chapter  3  are  strict  in  the  sense  that, 
if  they  are  followed,  BALANOVA  5  will  execute  correctly.   However  the  rules 
may  be  relaxed  or  ignored  in  the  case  of  the  number  of  levels  of  the  repli- 
cation factor  and  it  is  sometimes  convenient  to  do  so. 

In  Class  A  designs,  any  number  of  levels  of  the  replication  factor  may 
be  punched  on  the  Factor  Specification  Card  provided  the  number  is  >  2.   This 
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is  so  because  in  Class  A  designs  only  cell  means  are  stored  and  the  program 
does  not  check  the  replication  number  anyway.   This  rule  relaxation  is  use- 
ful when  it  is  inconvenient  for  the  user  to  calculate  ahead  of  time  how  many 
subjects  are  in  each  cell. 

In  Class  B  designs,  any  number  of  levels  of  the  replication  factor  may 
be  punched  on  the  Factor  Specification  Card  provided 

(a)  the  number  is  >  the  maximum  number  of  replications 
in  any  one  nest,  and 

(b)  the  number  is  not  so  large  that  the  restriction  on 
the  size  of  the  X  matrix  (10,000)  is  exceeded. 

Again  this  rule  relaxation  saves  the  user  from  having  to  know  the  maximum 
number  of  replications  before  using  BALAN0VA  5>  provided  he  knows  an  upper 
limit.   The  restriction  on  the  size  of  the  X  matrix  will  not  often  be  ex- 
ceeded.  However,  the  user  has  not  been  informed,  in  this  manual,  how  to 
estimate  the  size  of  the  X  matrix  needed,  since  this  limit  is  complicated 
to  specify. 
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Appendix.   Key  to  Designs  in  Winer  and  Lindquist 
OMITTED  TEMPORARILY 


BINORMAMIN 

General  Description 

BINORMAMIN  ROTATION  rotates  a  matrix,  F,  of  orthogonal  factor  loadings 
to  oblique  simple  structure. 

It  does  this  by  iterating  for  T  in  FT  =  A  (where  A  is  the  rotated  factor 
pattern)  so  as  to  minimize: 

Z(v2  /h2) (v2  /h2) 
„  _  Z  v      _   Z   q   rjv  jp'  J    jq/  2  i 

K"PKP   P  q=P  <?»?  /h£)(£v2  A2) 

3  jp  3  j  jq  3 

Since  solving  directly  for  K  is  too  complex,  BINORMAMIN  takes  one  vector  at 
a  time,  rotating  it  against  all  the  others,  to  minimize  each  Kp. 

Its  name  comes  from  the  fact  that  is  uses  a  double  (Bl)  NORMAlization 
in  seeking  a  MINimum. 

For  further  information  see : 

1.  Kaiser,  H.  F.  and  Dickman,  K.  W.,  "Analytic  Determination  of 
Common  Factors".   Unpublished  manuscript,  University  of 
Illinois,  1959- 

2.  Harmon,  H.  H.,   Modern  Factor  Analysis  .   Chicago,  University 
of  Chicago  Press,  i960.   pp-326ff. 

Restrictions 

Input  is  limited  to  matrices  of  150  x  30  or  less. 
Parameters 

After  the  program  name,  BINORMAMIN,  are  the  following  parameters: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  factor  matrix,  F. 

2  Output  Address  of  factor  matrix,  F. 

3  Output  Address  of  the  transformation 
matrix,  T. 

h  Output  Address  of  the  reference  vector 

structure,  V. 

5  Output  Address  of  the  correlations  between 

reference  vectors. 

Output  Address  of  the  primary  factor 
pattern,  P. 
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Parameter 
Number 

7 
8 


10 


Use  or  Meaning 

Output  Address  of  the  correlations  between 
factors. 

Maximum  number  of  iterations  (see  note).  If 
blank,  the  maximum  will  be  set  at  100  itera- 
tions. 

Convergence  criterion  (see  note). 

A.  Defined  zero  change:  iterating  will  stop 
when  each  element  in  V  changes  by  less 
than  A.   (A  must  be  less  than  .2  and  no 
less  than  .0000001) . 

B.  Defined  zero  rotation:   iterating  will 
stop  when  each  vector  in  T  changes  by 
less  than  Q,    where  0  is  the  angle  whose 
cosine  is  B.   (B  must  be  less  than  1.0 
and  no  less  than  . 2 ) . 

If  left  blank,  A  will  be  set  to  .001. 

If  an  initial  T  is  to  be  read  in,  input 
address  of  T. 
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Output  Address  of  the  initial  T. 


Note  on  Output:   A.   Any  output  option  left  blank  will  not  be  output 
B. 


C. 


The  program  will  always  print  out  the  program 
name  and  the  number  of  iterations  actually  done. 

The  program  will  print  out  the  largest  change  in 
V,  unless  option  9B  is  used. 


D.   All  data  printed  out  is  to  7  decimal  places. 

Note  on  parameters  8  and  9:      Program  will  stop  at  whichever  criterion  it 
meets  first. 

Note  on  parameter  9:   This  parameter  is  a  floating  point  constant  and  there- 
fore must  be  enclosed  in  asterisks,  with  a  decimal  point,  as  in  example: 

Example:   BINORMAMIN  (CARDS) ( H ) (SEQ  1  /PRINT H ) () (PRINT) ( )*.0001x . 

Store  V  on  SEQ1,  also  prints  V  and  correlations  between  factors.   On  the  last 
ration,  no  element  in  V  changed  by  more  than  .0001,  unless  the  maximum  of 
!    I ' orations  was  reached. 
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General  Description 

This  program  calculates  the  following  coefficients  for  each  com- 
bination of  one  dichotomous  and  one  continuous  variable. 


Case  totals 


%   cases  in  p  =  N  /N 

%   cases  in  q  =  Na/N 
Total  cases  =  N 


Mean: 


Standard  deviation 


Xp  =  !Xp/Np 

*q  =  V^ 
X  =  2t/N 


2  _ 


SXp2/^  -  X 


y   2 


Scf  =  2Cq2/Nq  ^ 

s2  =  £x2/n  -  x' 


Biserial  r. 


(Xp-X  )  pq 
=  S  (.3989)  h 


where  p  =  percentage  of  cases  in  0  category 
q  =  percentage  of  cases  in  1  category 
h  =  height  of  the  normal  curve  computed  from  normal  tables 

The  program  checks  for  missing  data,  and  computes  the  above  measures 
only  for  those  cases  where  both  dichotomous  and  continuous  variables  are 
present. 

Restrictions 

A.  Dimension 

The  input  for  the  BISERIAL  R  program  is  limited  to  a  maximum  of 
20  dichotomous  variables  and  100  continuous  variables. 

B.  Type  of  input 

Data  is  read  row-wise  from  either  tape  or  cards.   All  dichotomous 
variables  must  be  first  in  each  row.   They  should  be  coded  with 
0  and  1 

Parameters 


The  program  call  card  requires  h   parameters  after  the  program  name, 
BISERAL  R: 
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Parameter 

Number 

1 
2 

3 
k 

5 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15 . 

Output  Address  of  correlations.   SEQUENTIAL  1-15 
and/or  PRINT. 

Number  of  dichotomous  variables. 

Number  of  continuous  variables. 

0  -  ignore  blanks 

1  -  count  blanks  as  zeros 


October  7,  1969 

SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting' 


CANONICAL  ANALYSIS 

I.   GENERAL  DESCRIPTION 

A.  Mathematics 

The  Canonical  Correlation  program  provides  a  multivariate  test  of  the 
hypothesis  that  two  sets  of  normally  distributed  variables  are  independent. 
The  larger  set  of  variables  (the  predictor  variables)  is  considered  to 
have  q_  members,  and  the  smaller  set  (criterion  variables)  has  p  members. 
This  program  also  linearly  transform  each  set  of  variables  into  a  new  set 
of  independent  variables,  (or  dimensions)  such  that  the  first  new  predictor 
(a  linear  combination  of  the  original  predictors)  has  maximum  correlation 
with  the  first  new  criterion  variable.   The  second  new  predictor  is 
maximally  correlated  with  the  second  new  criterion,  and  so  on  (with  the 
constraint  that  each  new  variable  is  uncorrelated  with  the  previous  new 
variables  derived  from  the  same  set  of  original  variables  ) . 

Let  the  criteria  set  consist  of  the  p  variables  xn , ,x  and 

1        v 
the  predictor  of  q  other  variates  x_   ,,...., x     .   Assume  p<  q.   We 

-*■  p  +  -Lp  +  n  r —  ^ 

then  look  for  weighting  matrices  c^o  and  d^  such  that 

I 

^a  =     6  °3  X6      a  =  1,  2,  ...  p 

e   =  1,  2,  .  .  .  p 


na    b  dabxb 


a  =  1,  2,  .  .  .  p 

b  =  p+1 ,  p+2 ,  .  .  .  p+q_ 


The  variables  £  and  nhave  the  following  properties: 

1.  They  are  standardized  variables. 

2.  Within  each  set,  the  £'s  are  independent  and  the  n's  are  dependent 

3.  Within  the  set  £a  (as  a_  runs  from  1  through  p)  and  within  the  set  of  na 
(as  a  runs  from  1  through  p)  the  correlation  is  zero.   These  matrices 

are  formed  in  such  a  way  that  5]_  and  nn  »   £9  and  r\      ,  etc.  are  maximally 
correlated  giving  p  correlations  l,....,p. 

The  purpose  of  this  program  is  to  find  the  p  correlations  A  ,. . .,X  and 

the  weighting  matrices  c   and  d  ,  .  p 

00  00      ab 

B  Procedure 

The  correlation  matrix  R  is  first  partioned  into: 


where  A  =  correlation  among  predictors 

S=  correlation  between  predictors  and  criterif 
-  correlation  among  criteria 


The  matrix  equation 
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(CA_1  C1  -  A2B)  U  =  0 


is  solved  to  find  the  p  correlations  X  , A  ,  and  the  criteria  weighting 

matrix  U  whoseelements  are  cag.   The  eigenvalues  D  and  eigenvectors  H  of  B 
are  found:   CA  C  is  then  post-multiplied  by  HD~l/2  and  pre-multiplied  by 
its  transpose  to  reduce  the  equation  to  the  standard  form  (Z  -  AI)U=0. 
Then  the  predictor  weighting  matrix  V  whose  elements  are  d   is  computed  by 
the  equation, 


V 


(A""  C  U)  !/* 


,-1 


II.   INPUT 

Input  to  the  CANONICAL  ANALYSIS  program  consists  of  a  correlation  matrix. 
These  variables  include  a  set  of  predictor  variables  and  a  set  of  criterion 
variables.   Either  set  may  be  first  on  the  input  data  but  there  can  be  no 
mixing  of  the  two  types  of  variables  on  the  input  data.   The  TRANSFORMATION 
program  may  be  used  to  reorder  the  variables  if  they  are  mixed  on  the 
card  data  deck. 

III.   SIGNIFICANCE  TESTS 

Included  in  the  printed  output  of  the  CANONICAL  program  is  a  Chi-square 
value  for  each  of  the  eigenvalues  A  2  computed  in  the  program.   The  chi-square 
values  printed  are  determined  from  the  Wilks'  lambda  values  using  the 
procedure  outlined  by  Bartlett  (see  Section  IX).   The  chi-square  values 
provide  a  test  of  the  null  hypothesis  that  the  p_  variates  are  unrelated  to 
the  q  variates.   If  there  is  at  least  one  way  in  which  a  linear  combination 
of  the  predictor  variables  is  correlated  with  a  linear  combination  of 
the  criterion  variables  this  Chi-square  value  will  be  significant.   The 
second  Chi-square  may  then  be  examined.   This  Chi-square  is  a  test  of 
a  second  relationship  after  the  first  relationship  has  been  removed.   If 
this  Chi-square  is  significant  a  second  linear  combination  of  the  predictor 
variables  is  correlated  with  a  second  linear  combination  of  the  criterion 
variables.   This  process  continues  until  the  first  non-significant 
Chi-square  is  found.   All  Chi-squares  beyond  that  point  will  be  non- 
significant. 

IV .   OUTPUT 

The  output  consists  of  the  following: 

1.   The  matrix  of  standardized  regression  coefficients.   This  is  the  matrix 
of  coefficients  which  would  be  formed  if  the  raw  data  used  to  calculate 
the  correlation  matrix  input  had  been  converted  to  standard  scores. 
The  predictor  variables  are  on  the  rows  of  the  matrix  and  the  criterion 
variables  are  on  the  columns  of  the  matrix. 


A  multiple  correlation  squared  (R  )  for  each  of  the  criterion  variables, 
The       R^  value  is  the  multiple  correlation  of  the  first  criterion 
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variable  with  the  entire  set  of  predictors  variables.   The  second  is 
for  the  second  criterion  variable  with  the  set  of  predictors,  etc. 

2 
3.   A  set  of  eigenvalues  X,  correlations  X,  Wilks'  lambdas,  Chi-squares, 
and  degrees  of  freedom.   (See  Section  III)  (Printed) 

k.      A  matrix  of  criterion  weights.   (Parameter  Number  3) 

5.   A  matrix  of  predictor  weights.   (Parameter  Number  h ) 

V.  RESTRICTIONS 

A.  Dimension 

The  maximum  permissible  number  of  variables  in  each  set  is  80. 

B.  Special  Conditions 

The  matrices  A  and  B  must  both  be  non-singular 

C.  Number  of  criteria  (p)  must  be  equal  to  less  than  the  number  of 
predictors  (q). 

VI.  PARAMETERS 

The  parameters  for  the  CANONICAL  ANALYSIS  program  follow  the  program 
name  on  the  main  program  card.   Each  parameter  must  be  enclosed  in 
parentheses.  The  parameters  must  appear  in  the  order  given  below.   If 
a  parameter  is  not  needed,  do  not  punch  anything  between  its  parentheses. 
All  parentheses  after  the  last  non-empty  pair  may  be  omitted. 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  (correlation  matrix).  CARDS 

or  SEQUENTIAL  1-5- 

2  Output  Address  of  canonical  correlations. 

SEQUENTIAL  1-5  and/or  PRINT. 

3  ft  Output  Address  of  criterion  weighting 

matrix.   SEQUENTIAL  1-5  and/or  PRINT. 

k  ft  Output  Address  of  predictor  weighting 

matrix.   SEQUENTIAL  1-5  and/or  PRINT. 

5  Number  of  predictor  variables. 

6  Number  of  criterion  variables.  (Must 

be  less  than  or  equal  to  number  of 
predictors) . 

T  Order  of  variable  sets  on  input: 

1  if  predictors  are  first 

2  if  criteria  are  first 


Parameter 
Numbers 


9 

10 
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Use  or  Meaning 

Operation  to  be  performed: 

1  if  only  canonical  correlations 

2  if  weights  are  to  be  computed 

1  if  want  regression  coefficients  printed 

1  if  want  multiple  correlation  squared 
(R  )  printed 


It  is  possible  to  print  in  F  format  and/or  punch  the  output  from  these 
parameters.   If  you  need  either  of  these  options,  see  the  section  in  the 
Introduction  on  Input  and  Output. 

VII.   SPECIAL  COMMENTS 

This  program  does  not  check  for  missing  data.   All  blank  spaces  are 
read  as  zeros. 


VIII.   EXAMPLES 
1A 

/*ID  <accounting  information 
//   EXEC   SOUP 
//SOUP.SYSIN  DD   * 


IB 

/*ID  <accounting  information 

//  EXEC  SOUP 

//SYSIN  DD   * 


CANONICAL  (CARDS) (SEQUENTIAL  l/PRINT)    CAN  (C)  (Sl/P)(P) (P)(l5) (5) (2) (l) . 

(PRINT) (PRINT) (15) (5) (2)  (1). 

ENDS  ENDS 

DATA  DATA 


END# 
/* 


END# 
/* 


Examples  1A  and  IB  illustrate  the  use  of  the  program  call  card  for 
a  CANONICAL  correlation  program.   In  these  examples  the  input  data  consists 
of  15  predictor  and  5  criterion  variables  and  will  be  a  card  deck. 

:tor  variables  are  the  first  15  variables  of  each  observation 
in  the  data  deck.   The  printed  output  will  be  the  canonical  correlations, 
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the  criterion  weighting  matrix,  and  the  predictor  weighting  matrix.  The 
canonical  correlation  matrix  will  also  be  stored  on  temporary  storage 
SEQUENTIAL  1.   1A  &  IB  perform  same  calculations. 

IX.   REFERENCES 

For  a  discussion  of  the  uses  of  canonical  analysis,  see  Kendall,  M.G. , 
A  Course  in  Multivariate  Analysis,  New  York,  Hafner,  196l,  pp.  68-85 
or  Kendall,  M.G. ,  The  Advanced  Theory  of  Statistics,  New  York,  Hafner 
1951,  Vol.  II.,  pp.  3^8-358. 

For  the  derivation  of  the  method  used,  see  Anderson,  T.  W. ,  An 

Introduction  to  Multivariate  Statistics,  New  York,  Wiley,  1958, 
pp.  288-296. 

For  significance  test  procedure,  see  Cooley,  W.  W.  and  Lohnes,  P.  R. , 
Multivariate  Procedures  for  the  Behavioral  Sciences,  New  York, 
Wiley,  1962,  p.  37- 


CENTROID  FACTOR  ANALYSIS 


I.   General  Description 

CENTROID  FACTOR  ANALYSIS  computes  a  set  of  f  linearly  independent 
vectors  (factors^  which  are  mutually  uncorrelated.  Normally,  a  factor 
analysis  decomposes  a  matrix  of  correlations,  R^,    into  a  set  of  f 
factors.   The  factors  are  arrayed  as  column  vectors  in  the  factor  matrix, 
F,  such  that 

«n  =  FF'  +  E(n-f) 

where  Rn_f  is  the  matrix  of  residual  effects.   The  K   factor  is  computed 
by  dividing  the  column  sums  of  Rn_k  by  the  square  root  of  the  total  sum  of 
elements  of  R^^ 


fi,k  =  Z   r.  .   (h)/  Jzz    H>i     (k) 


J         13 

Between  each  factor  extraction,  the  variables  in  the  residual  matrix 
are  successively  reflected  until  all  the  column  sums  are  positive. 

For  more  detailed  discussion  see: 

1.  L.  L.  Thur stone,  Multiple  Factor  Analysis,  Chicago, 
University  of  Chicago  Press,  19^7,  pp.  149-175- 

2.  Harry  Harmon,  Modern  Factor  Analysis,  Chicago, 
University  of  Chicago  Press,  I960,  pp.  192-215- 

Restrictions 

The  input  matris  for  the  CENTROID  program  must  not  exceed  the  dimen- 
sions of  190  x  190.   The  input  matrix  is  further  limited  to  being  a  square, 
positive  definite  or  semi-definite,  symmetric  matrix.   Commonly,  correlation, 
covariance,  or  cross-product  matrices  are  used  as  input  data.  Any  attempt 
to  introduce  communality  estimates  (change  the  diagonal  elements)  must  be 
made  before  data  is  passed  to  the  CENTROID  program.   A  set  of  communalities, 
which  are  incorrectly  estimated,  can  make  the  matrix  non-positive  and  could 
conceivably  cause  a  hang-up. 

The  input  data  may  come  from  any  storage  medium  which  conforms  to  SOUPAC, 
Similarly,  the  output  codes  follow  the  established  conventions  and  are  at 
the  option  of  the  user. 

The  input  matrix  may  be  completely  factored  (i.e.,  N  factors  from  a 
N  variable  matrix) .   However,  factoring  may  be  stopped  by  any  of  three 
criteria: 

1.   The  user  may  specify  the  number  of  factors  to  be  extracted. 
This  criterion  provides  an  upper  limit  beyond  which  factoring 
will  not  be  done.   Consequently,  it  is  advisable  to  put  the 
maximum  value  on  this  limit  in  cases  where  it  is  not  the 
primary  criterion.   (Set  it  equal  to  the  number  of  variables). 
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2.  The  per  cent  of  total  variance  removed  from  the  R  matrix  is 
a  second  limiting  criterion.  This  parameter  also  specifies 
an  upper  limit  to  the  process.  Therefore,  it  should  be  set 
at  100  per  cent  unless  it  is  the  criterion  for  stopping. 

3-   The  last  criterion  is  to  stop  when  the  factor  contribution  falls 
below  1.   The  use  of  this  procedure  is  dictated  by  the  presence 
or  absence  of  its  associated  parameter. 

If  all  three  criteria  are  used  simultaneoulsy,  factoring  will  be  stopped  by 
whatever  criterion  is  met  first. 


III.   Parameters 


Following  the  program  anme  on  the  program  call  card  come  the  parameters 
needed  by  the  program.   The  parameters  must  appear  in  the  order  below: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

2  Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

3  Maximum  number  of  factors  to  be  extracted. 
This  must  be  less  than  or  equal  to  the 
order  of  the  input  matrix. 

k  Per  cent  of  total  variance  to  be  removed 

expressed  as  an  integer  between  0  and  100. 

5  The  presence  of  any  number  greater  than  0 
in  this  parameter  indicates  that  factoring 
should  stop  when  the  factor  contribution 
falls  below  unity. 

6  Output  Address  of  Residual  Matrix. 

If  parameters  3  and  k   are  left  blank  then  by  default  option  they  will  be 
set  to  maximum  possible  values  and  a  message  will  be  printed. 

Residual  Matrix  must  be  stored  before  it  can  be  printed. 

Example:  Assume  that  you  have  77  variables  and  that  the  correlation  matrix 
is  stored  on  SEQ  1  ,  then  legal  forms  of  CENTROID  call  statement  may  be: 

CENTROID (SEQ  l) (PRINT) (77) (100) (l) . 
:;iTR0ID  (SEQ  1)  (P(F) )  (50)  (80) . 

OID  (SEQ  1)(SEQ  :  /P) (20) (100) (l) (SEQ  3/P) . 

2/p(F))(l5)  (90)0  (SEQ  3/P(F)). 
IITR0ID  (SEQ  1)0  ■       this  case,  number  of  factors  =  77 
and  per  cent  of  variance  ■•■■   100  will  be  assumed  by  default. 
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CLASSIFICATION 


General  Description 

The  CLASSIFICATION  program  is  designed  to  measure  individuals  against 
previously  determined  groups  in  order  to  determine  probable  group  member- 
ship.  The  classification  is  done  in  a  reduced  test  space  derived  from 
discriminant  analysis.   The  method  of  classification  is  based  on  the 
premise  that  a  group  is  totally  described  by  its  mean  (or  centroid)  and 
dispersion;  the  individual's  relation  to  each  group  is  determined  by  a 
X^  which  indicates  how  many  members  of  the  group  are  farther  from  the 
centroid  than  he,  and  a  Bayesian  probability  of  membership  in  the  group 
based  on  this  X^.  For  each  individual,  the  X^  and  probability  for  each 
group  are  given;  the  user  then  applies  a  decision  rule  of  his  choice 
for  assigning  individuals  to  groups. 

Since  analysis  is  to  be  performed  in  a  reduced  space,  the  means  and 
dispersion  of  each  group  must  also  be  reduced  to  this  space.   The  CLASSI- 
FICATION program  performs  these  reductions. 

The  calculation  of  probabilities  requires  the  specification  of  the 
number  of  members  in  each  group  against  which  the  individual  is  being 
compared.   This  may  be  the  numbers  actually  in  the  groups  used  for 
finding  experimental  means  and  standard  deviations  or  the  number  of 
individuals  from  the  total  group  being  tested  who  are  to  be  assigned 
to  each  group.   These  numbers  are  specified  as  subparamters . 

In  every  case  the  input  is  expected  in  the  form  in  which  it  is 
output  by  the  DISCRIMINANT  ANALYSIS  program.   Input  is  limited  to  16 
groups  and  50  variables. 

Parameters 

The  program  name  CLASSIFICATION  is  followed  by  these  parameters  on 
the  program  call  card: 

Parameter 
:.   oer  Use  or  Meaning 

1  Input  Address  of  discriminant  vectors. 

The  vectors  are  expected  as  columns. 
CARDS  or  SEQUENTIAL  1-15- 

2  Input  Address  of  means.   The  means  for  a 

given  group  are  expected  as  a  row.   CARDS 
or  SEQUENTIAL  1-15- 

3  Input  Address  of  dispersion  matrices.   The 
dispersion  matrices  are  expected  in  a 
vertically  augmented  form.   CARDS  or 
SEQUENTIAL  1-15- 

h  Input  Address  of  individual  scores.   CARDS 

or  SEQUENTIAL  1-15- 
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Parameter 
Number 

5 
6 


7 
8 

9 

10 


Use  or  Meaning 

Output  Address  for  inverse  of  dispersion 
matrix..   PRINT  or  SEQUENTIAL  1-15- 

2 
Output  Address  for  X  and  probabilities  for 

each  subject.   SEQUENTIAL  1-15  and/or  PRINT. 

(Output  on  SEQUENTIAL  is  in  the  form 

wi'  xil> xin>  Yil> Yin 

where  N^  =  sequential  subject  number  in  groups 


Xn 


ith 


th 


ij  =  J   probabilities  for  i   subject 


Yu 


_  -sth 


X2  for  ith  subject) 


Number  of  groups 


Number  of  variables. 

Number  of  discriminant  vectors. 

Input  Address  of  group  sample  sizes. 
CARDS  or  SEQUENTIAL  1-15. 
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CLIQUE  ANALYSIS 


General  Description 

This  routine  is  designed  to  enumerate  all  third  order  or  higher 
interrelationships  (communication  chain)  which  exist  in  a  sociometric 
matrix.   The  algorithm  is  identical  to  the  method  described  by  Harary 
and  Ross.-'-  A  communication  chain  is  considered  to  be  any  submatrix  of 
order  three  or  more  in  which  all  the  off  diagonal  cells  are  full. 

Restrictions 

The  maximum  dimensions  for  an  input  array  is  190  x  190.   Input  may 
come  from  cards  or  any  temporary  storage  area.   The  array  must  contain 
only  zeroes  and  ones  in  its  elements.  Any  number  greater  than  zero  is 
considered  to  be  one;  therefore,  care  should  be  used  in  constructing  the 
array.   Symmetry  in  the  input  matrix  is  not  necessary  since  the  program 
automatically  forces  symmetry  through  element-wise  products.   It  is 
suggested  that  TRANSFORMATIONS  be  used  to  modify  input  arrays  when  various 
cut-off  points  are  used  to  distinguish  ones  from  zeroes. 

Parameters 

The  name  CLIQUE  ANALYSIS  appears  first  on  the  program  call  card  and 
is  followed  by  the  following  parameter: 


Parameter 
Numbe r 


Use  or  Meaning 

Input  Address  of  data  array. 
CARDS  or  SEQUENTIAL  1-15 . 


Special  Comments 

The  following  is  an  illustration  of  the  clique  detection  concept 
Data  matrix: 
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Clique  (1)  1,  2,  3 

Clique  (2)  8,  6,  7,  9 

Clique  (3)  k,    3,  5 

Clique  (1+)  3,  5,  6 

Clique  (5)  k,    5,  7 

Clique  (6)  5,  6,  7 

Harary  and  Ross,  "A  Procedure  for  Clique  Detection  Using  the  Group  Matrix", 
Sociometry,  Vol.  20,  No.  3,  1956,  pp.  2-5,  215- 
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COMMUNALITY  ESTIMATION 


General  Description 

Five  methods  of  COMMUNALITY  ESTIMATION  are  offered  in  this  program. 
In  each  case  the  estimates  replace  the  diagonal  elements  of  the  matrix. 
They  are  as  follows : 

Code  Number         Method 

1  The  element  of  largest  absolute  magnitude  in 
each  row  replaces  the  diagonal  element  of  the 
row. 

2  The  square  of  the  multiple  R  of  each  variable 
with  all  others  replaces  the  diagonal  entry 
for  that  variable. 

3  Communalities  produced  from  another  analysis 
and  are  to  be  input  from  cards  or  another 
storage  medium. 


Restrictions 


"      2    1/2 
For  each  row  (N)    ((z  r-  .  )/n)  ' 

J=l 
replace  the  diagonal  entry  for  that  row. 
This  is  the  square  root  of  the  average 
square  across  the  row. 

For  each  row  (N)    (rfk)(S.  -  ^k)/(\   '   rJR) 

replaces  the  diagonal  entry  for  that  row 
where : 

r*  =  max  abs  (r.  .)  and 
lk  ij' 

S.  =  I  abs  (r.  .),  S  =  Z    abs  (r.  .) 

J  J 

This  method  of  COMMUNALITY  ESTIMATION  is  due  to 
Professor  L.  Tucker. 


Input  is  restricted  to  correlation  matrices  of  order  150  or  less. 

Parameters 

The  parameters  for  the  COMMUNALITY  ESTIMATION  program  appear  on  the 
program  call  card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
'.'.  ::-■  :r  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15. 


Parameter 
Number 

2 

3 
k 


IV.   Special  Comments 
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Use  or  Meaning 

Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 
Section  Code  Number.   (See  General  Description) . 
Input  Address  if  Option  3  is  used. 


If  the  correlation  matrix  and  communalities  both  are  input  from  cards, 
the  correlation  matrix  precedes  the  communality  estimations.   (See  Code 
Number  3  in  the  General  Description) . 


Reference 

Harmon,  Harry:   Modern  Factor  Analysis,  Chicago,  University  of  Chicago 
Press,  i960,  pp.  83-90. 
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CORRELATION 


I.   GENERAL  DESCRIPTION 

The  main  purpose  of  the  CORRELATION  program  is  the  calculation  of 
Pearson  product-moment  correlations  (hereafter  referred  to  as  correlations 
in  this  writeup"1  .  A  correlation  measures  the  linear  dependency  between 
two  variables,  and  this  program  calculates  a  correlation  for  each  pair 
of  input  variables.   The  square  of  a  correlation,  sometimes  called  the 
coefficient  of  determination,  represents  the  proportional  reduction  in 
variance  of  one  variable  due  to  a  linear  relationship  with  another.   Thus 
the  coefficient  of  determination  measures  the  strength  of  a  linear  relation- 
ship, or  the  proportion  of  variance  accounted  for  by  a  linear  rule. 

In  the  process  of  calculating  the  correlations,  the  means  and  standard 
deviations  of  the  individual  variables  are  computed,  as  are  the  cross- 
products  and  covariances  between  variables.  After  the  correlations  have 
been  calculated,  they  are  used  to  calculate  the  linear  regression  coefficients 
and  corresponding  intercept  terms  needed  for  predicting  each  variable  from 
each  other  variable. 

I.   INPUT 

Input  to  the  CORRELATION  program  consists  of  a  set  of  independent 
observations  on  two  or  more  variables.   The  data  is  considered  as  a 
two-dimensional  array  (or  matrix)  of  numbers  with  each  column  containing 
the  observations  on  one  variable,  and  each  row  consisting  of  one  obser- 
vation on  each  variable.   If  we  use  the  letter  X  to  represent  the  matrix 
of  raw  data,  we  let  X±j   represent  the  i^b  row  (where  i  =  1,  2,  ...  N)  and 
the  j^h  column  (where  j  =  1,  2,  . . .  M) .   In  other  words,  we  have  N  obser- 
vations (rows'*  and  M  variables  (columns)  in  our  data  matrix  X. 

I.   FORMULAS  AND  CALCULATIONS 

The  following  formulas  define  certain  statistics  and  illustrate  their 
methods  of  calculating  within  the  program.   The  subscript  i  refers  to 
observations  (or  individuals)  and  runs  from  1  to  N.   The  subscripts  j  and 
refer  to  variables,  and  they  run  from  1  to  M. 


N 

z  y 

i=l 


ij 


Mean  =  X.   = 

(of  variable  j) 


N  N  N  N 

i=l 


Z  (X.  .  -X.)(X..    -X.  )         N  ZX..X.,   -  (   ZX.  .)(    ZX..  ) 
=1    iJ       0        ik      V  i=1  ij   lk       i=1iJ  /vi=1ik' 


COVarianCe*  =    C..     =    -  rrrr; r-r 

/,  ,        .  .  .      jk  N  -  1  N(N  -  1) 

(between  variables    °  v 

j  and  k) 
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N  _     2  /       N  N 

Z  (X.  .  -X.)  /N     ZXf.  -  (    ZX.  .) 

i=l    1J        ^  /      i=l1J        i=l1J 


Standard  Deviation*  =  S.   =  J — = =J       X  \tUt — ^ =      J~C~. 


(of  variable  j) 


Correlation        =  R 
(between  variables     J     j  k 


D       N  N-l  V  N(N-l)  j'o 


N  N  N 

N    ZX.  .X..    -  (   ZX.  .)(    ZX..  ) 
C.                             .    ,  lj    ik      \    ,  ij'  \    .  iky 
jk  i=l      i=l    u      i=l 


between  variables  3  j  k       /        N    2  N  /~~     N  N 

j    and  k)  /    N    ZX       -  (   ZX      )      /   N    ZX       -  (   ZX      ) 

\J        i=l1J       i=l1J      V        i=llk       i=llk 

From  the  equation  X. .  =  B  X.,  +  A  ,  the  program  calculates 


2 


S. 
Linear  Regression  Coefficient  =  B.,  =  R.,(— ) 

(for  predicting  variable  j  from  variable  k)  k 

Intercept  =  A   =  X.  -  B.  X 

(constant  term  in  equation  for 
predicting  variable  j  from  variable  k) 

*NOTE :  the  sample  covariances  and  sample  standard  deviations  are  unbiased 
estimates  of  the  corresponding  population  parameters.   The  definitions 
given  here  follow  the  practice  of  many  current  statisticians.   [See  Anderson 
(1958)  -  Chapter  3  for  example.] 

IV.   SIGNIFICANCE  TESTS 

If  we  assume  that  two  variables  (indexed  by  j  and  k)  have  a  bivariate 
normal  distribution,  there  is  a  test  statistic  for  testing  the  hypothesis 
that  the  correlation  in  the  population  is  zero  (or  equivalently  that  either  J 
regression  coefficient  is  zero) .  Even  for  a  relatively  small  sample  size 
(N),  this  hypothesis  can  be  tested  using  the  t  ratio: 


t  = 


R..-s/N-2 

>/     2 
Jk 


with  N  -  2  degrees  of  freedom.   Other  types  of  hypotheses  can  be  tested 
through  use  of  the  Fisher  R  to  Z  transformation.   [See  Hays  (1963),  page! 
529  -     for  example.] 


V.   OUTPUT 


om  the  CORRELATION  program  may  consist  of  any  or  all  of 
statistics      section  III  above,  by  using  parameters  2  through  7* 
output  from  this  program  may  be  printed  and/or  output  to  temporary 
storage  -  1-5).   The  means,  standard  deviations,  and  the  sample 

'ij)  are  output  as  a  matrix,  with  M  rows  (one  for  each  variable)  and 

1  column  will  have  a  constant  value  oJ     ir  all 
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variables).   Correlations,  covariances,  and  cross-products  are  printed 
as  lower  triangular  matrices,  while  the  regression  coefficients  and 
intercepts  are  printed  as  square  matrices.  However,  all  five  of  these 
matrices  are  stored  as  square  matrices. 

VI .   RESTRICTIONS 

The  CORRELATION  program  will  accept  an  unlimited  number  of 
observations,  but  the  number  of  variables  is  limited  as  noted  in  the 
section  on  PROGRAM  LIMITS  in  the  INTRODUCTION. 

VII.   PARAMETERS 

The  parameters  for  the  CORRELATION  program  follow  the  program 
name  on  the  main  program  card.  Each  parameter  must  be  enclosed  in 
parentheses.  The  parameters  must  appear  in  the  order  given  below. 
If  a  parameter  is  not  needed,  do  not  punch  anything  between  its 
parentheses.  All  parentheses  after  the  last  non-empty  pair  may  be 
omitted. 


Parameter 
Number 


Use  or  Meaning 

Input  Address  of  raw  data  (X  matrix) . 
CARDS  or  SEQUENTIAL  1-5- 


5 

6 


Output  Address  for  means,  standard 
deviations,  and  sample  size. 
SEQUENTIAL  1-5  and/or  PRINT. 

n  Output  Address  for  correlation  matrix 
(R).   SEQUENTIAL  1-5  and/or  PRINT. 

P.  Output  Address  for  cross-products 
matrix.   SEQUENTIAL  1-5  and/or  PRINT. 

I   Output  Address  for  covariance  matrix  (c). 
SEQUENTIAL  1-5  and/or  PRINT. 

)  Output  Address  for  matrix  of  regression 
coefficients  (B) .   SEQUENTIAL  1-5  and/or 
PRINT. 


7 

8 


l   Output  Address  for  intercepts  (matrix  A). 
SEQUENTIAL  1-5  and/or  PRINT. 

1  if  last  variable  in  each  row  is  a 
weighting  factor. 


-  It  is  possible  to  print  in  F  format  and/or  punch  the  output  from  these 
parameters.   If  you  need  either  of  these  options,  see  the  section  in 
the  INTRODUCTION  on  INPUT  and  OUTPUT. 
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VIII.   SPECIAL  COMMENTS 

1.  This  program  does  not  check  for  missing  data.  All  blank  spaces 
are  read  as  zeroes.   If  you  have  missing  data,  use  the  MISSING 
DATA  CORRELATION  program. 

2.  In  the  output  matrices  of  regression  coefficients  and  intercepts, 
the  row  number  refers  to  the  dependent  variables,  and  the  column 
numbers  refer  to  the  independent  variables. 

3.  If  a  variable  is  constant,  an  error  message  will  be  printed  and 
all  correlations  with  that  variable  will  be  set  to  zero. 

h.  In  order  to  have  the  program  perform  its  calculations  separately 
for  sub samples  of  the  data,  see  the  section  on  CONTROL  VARIABLES 
in  the  INTRODUCTION. 

IX.   EXAMPLES 


1A 

/*TD  <accounting  information> 

//  EXEC  SOUP 

//SOUP.SYSIN  DD  * 

CORRELATIONS  (CARDS) (  ) (PRINT) 

END  SOUPAC 

DATA  (6)(6F2.0) 


IB 

/*TD   <accounting  information> 

//  EXEC  SOUP 

//SYS IN  DD  * 

COR  (C)(  )(P). 

ENDS 

DATA  (6)(6F2.0) 


END# 
/* 


END# 
/* 


Example  1A  illustrates  the  usage  of  the  CORRELATION  program.  Notice 
that  all  words  are  spelled  out  although  this  is  unnecessary.  Notice  also 
that  correlations  are  to  be  printed  out,  although  the  means  and  standard 
deviations  are  not.   Example  IB  will  perform  exactly  the  same  computations 
as  1A,  except  that  all  instructions  have  been  abbreviated  to  make  keypunching 
easier. 
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2 

/*ID  <accounting  information> 

//  EXEC  SOUP 

//SYS IN  DD  * 

COR  (CN(P)(P/S1). 

PRINCIPAL  AXIS  FROM  Si)  TO  (S2/P)  WITH  (10)  FACTORS  AND 

(100)  PERCENT  OF  THE  VARIANCE  TO  BE  REMOVED. 

VARIMAX  ROTATION  FROM  (S2)  TO  (PRINT). 

ENDS 

DATA  (20)(lOFU.O,5F6.2/lOX,5Fl+.l) 


END# 

/• 

In  the  second  example,  the  CORRELATION  program  first  prints  the  means 
and  standard  deviations.   Then  it  prints  the  CORRELATION  matrix  and  stores 
it  on  SEQUENTIAL  1  (Si).   The  PRINCIPAL  AXIS  program  then  performs  a 
principal  components  analysis  and  outputs  10  components  to  S2.  VARIMAX 
then  rotates  these  10  components,  using  the  VARIMAX  criterion,  and  prints 
the  results. 

X .   REFERENCES 

T.  W.  Anderson,  An  Introduction  to  Multivariate  Statistical  Analysis; 
John  Wiley  and  Sons,  Inc. ,  1958. 

E.  C.  Bryant,  Statistical  Analysis;  McGraw-Hill,  i960,  pp.  113-135- 

W.  L.  Hays,  Statistics  for  Psychologists;  Holt,  Rinehart  and  Winston, 
I960. 
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DISCRIMINANT  ANALYSIS 

I.   GENERAL  DESCRIPTION 

Suppose  that  we  have  k  populations  (groups)  and  p  measures  (variables) 
on  each  member  of  each  population.  We  want  to  test  the  hypothesis  that  our 
groups  are  significantly  different  on  the  entire  set  of  variables.  This 
one-way  multivariate  analysis  of  variance  hypothesis  is  tested  by  this 
program.   The  program  then  locates  the  dimensions  (discriminant  functions) 
along  which  the  group  differences  are  maximum.   Thus,  we  need  some 
function  to  transform  the  p  variates  into  a  smaller  set  of  independent 
measures  which  will  indicate  the  differences  between  the  groups.   The 
DISCRIMINANT  ANALYSIS  program  finds  the  independent  linear  functions  of 
the  variables  which  maximally  discriminate  between  the  populations  (groups 
input) .   The  results  from  this  program,  namely  the  discriminant  functions, 
may  be  used  in  the  CLASSIFICATION  program  to  determine  the  probability  that 
any  subject  belongs  in  any  group.  Also,  by  looking  at  the  coefficients  of 
the  functions,  we  can  determine  to  what  extent  each  of  the  p  variates 
contributes  to  each  function.   In  order  to  do  this  we  need  to  determine  the 
coefficients  of  the  functions  such  that  the  ratio  of  variances  between 
groups  to  the  variances  within  groups  is  maximized,  i.e.  the  differences 
between  groups  are  to  be  large  relative  to  the  differences  within  groups. 

In  matrix  terms,  we  are  trying  to  maximize  the  ratio 

f.  '  Af. 
XI   = 


f .  fWf . 
l   l 


where  f .  is  the  eigenvector  associated  with  the  i   eigenvalue  X±   of  W   A, 
A  =  the  covariance  matrix  between  means, 


:  Z  N   (X.   -  X.)  (X.   -  X.) 

r=l   g    lg 

and  W  =  the  covariance  matrix  within  classes, 


ij  ^=1     g   ig       k  Og    3 


k   Ng 
w.  .  =  £  [  Z   (X.    -  X.  )  (X.   -  X.  )] 
10  g=l  n=l   xgn    lg    Jgn    °g 

where  k  =  number  of  groups,  Ng=  number  of  subjects  in  group  g,  N  =  total 
number  of  subjects,  and  i  and  j  run  from  1  to  p,  where  p  =  number  of 
variables. 

To  find  the  maximum,  we  derive  from  the  partial  derivatives  of  that 
ratio,  the  matrix  equation 

(W'1A-  XI)  F  =0 

where  F  is  the  matrix  of  eigenvectors.   The  eigenvectors  are  the  coefficients 
of  the  discriminant  functions.   The  relative  sizes  of  the  eigenvalues 
indicate  the  extent  to  which  the  associated  discriminant  functions  distinguish 


among  the  groups.   The  percentage  of  the  total  discriminating  power  of 
the  variables  contained  in  the  J^h  discriminant  function  is  represented  by 

100  (_ii 


N  (N  should  be  the  smaller  of 

Z  ■  \.  k-1  and  p) 


i=l 


1 


In  addition  to  obtaining  the  eigenvalues  and  discriminating  coefficients, 
the  program  will  compute  scaled  vectors  to  show  the  relative  contributions 
of  the  variables  to  the  discriminant  function  by 

f   •=(*..)  ^2/t.  . 

ij      11     '    10 

II.  INPUT 

Input  to  the  DISCRIMINANT  ANALYSIS  program  consists  of  two  or  more 
data  groups.  Each  data  group  consists  of  a  set  of  observations  on 
two  or  more  variables.  All  of  the  groups  must  contain  observations  on 
the  same  set  of  variables.   The  groups  may  be  input  as  separate  card  decks 
(each  preceded  by  a  DATA  format  card  and  followed  by  an  END#  card),  as 
data  groups  located  on  separate  temporary  storage  areas,  or  as  a 
mixture  of  data  groups  on  card  decks  and  data  decks  on  temporary  storage 
areas.  For  examples  see  Section  VIII. 

III.  SIGNIFICANCE  TESTS 

The  measure  of  significance  calculated  in  the  DISCRIMINANT  ANALYSIS 
program  is  a  Wilks'  lambda  (likelihood  ratio  test  statistic).   This  is 
a  test  of  the  discriminating  power  of  the  test  battery.   It  tests  the 
hypothesis  that  the  population  centroids  (mean  vectors)  are  equal  for 
the  k  groups.   The  Wilks'  lambda  is  a  function  of  the  roots  of  W_l A  and 
is  of  the  following  form: 

r     1 


A  =  H   1-A. 
i-1 

where  r  is  the  lesser  of  k  -  1  and  p.   In  matrix  terms  this  criterion  is 
defined  in  the  following  manner: 

A  =  M 


where  W  is  the  pooled  within  groups  deviation  score  cross-products  and  T 
is  the  total  sample  covariance  matrix.   As  |t|  increases  relative  to  |w| 
the  ratio  decreases  in  value  with  an  accompanying  increase  in  the  confidence 
that  the  group  centroids  are  not  equal. 


An  F  ratio  which  yields  an  approximate  test  of  the  significance  of 
the  Wilks'  lambda  is  calculated  and  printed. 


F  =  j1  '  V)    (ms  *  2X) 

v   y   ;  v    2r  ' 


where  s  =  /  ,  2  2  v    w,  2  .   2  _  v  q  =  k  -  1 

(p  q  ~4  )/(p  +  q  -5  ) 

m=n-  (p+q+  l)/2  n  =  N  -  1 

X  =  -(pq  -  2)/U  N  =  total  number  of  subjects 

r  =  pq/2  k  =  number  of  groups 

1/2 
y  =  A  p  =  number  of  variables 

The  degrees  of  freedom  to  be  used  with  the  F  value  printed  in  the  output 
are  printed  and  are  labeled  Fl  (degrees  of  freedom  for  the  numerator) 
and  F2  (degrees  of  freedom  for  the  denominator)  and  equal  2r  and 
nix  +  2A. 

IV.   OUTPUT 

The  output  consists  of  the  following: 

1.  Means  of  input  variables  for  each  group  and  group  sample  size 
(Parameter  Number  8) 

2.  A  dispersion  matrix  for  each  group.   (Parameter  Number  9) 

3.  The  total  sample  deviation  score  cross-products  matrix 

N         _         _ 

t. .  =  E   (X.   -  X. )(X,   -  X, ) 
ij   n=1   in    l   Jn    j 

where  i  and  j  range  over  the  variables.   This  matrix  is  the  sum 
of  the  A  and  W  matrix  described  in  section  I.   This  is  the  T  matrix 
referred  to  in  Parameter  Number  7 .   The  diagonal  of  this  matrix 
contains  the  sums  of  squares.   (Parameter  Number  6) 

h.      The  pooled  within-groups  deviation  scores  cross-products  matrix  which 
is  labeled  W  on  the  output.   (Parameter  Number  6) 

5-   The  total  number  of  subjects  in  all  the  groups  combined.   (Parameter 
Number  6 ) 

6.  The  means  and  standard  deviations  of  the  variables  across  all 
groups.   (Parameter  Number  6) 

7.  The  correlation  matrix  of  variables  over  all  groups.   (Parameter 
Number  6 ) 


8.  The  among  groups  cross-products  of  deviations  of  group  from  grand 
means  weighted  by  group  sizes.   This  matrix  is  labeled  A  matrix  on 
the  output.   (Parameter  Number  6) 

9.  The  eigenvalues  for  the  W   A  matrix.     (Printed) 

10.  The  eigenvalues  and  percentage  of  variance  explained  by  each 
additional  eigenvalue.   (Printed  on  output,  automatically) 

11.  The  trace  of  the  W  "  A  matrix.   This  is  the  sum  of  the  eigenvalues. 
(Printed  on  output,  automatically) 

12.  The  discriminant  functions  (f^.=  ).   The  number  of  discriminant 
function  will  equal  r  where  r  is  the  lesser  of  the  two  values 
k-1  and  p ,  where  k  =  the  number  of  groups  and  p  =  number  of 
variables.   (Parameter  Number  l) 

13.  The  group  means  on  the  discriminant  functions.   This  is  a 

k  x  r  matrix  formed  by  multiplying  the  group  means  on  variables 
and  the  discriminant  functions.   The  matrix  may  be  used  to  determine 
the  relative  positions  of  the  groups  on  the  derived  function. 
(Parameter  Number  11 ) 

lU.   The  scaled  vectors.   These  vectors  are  formed  by  multiplying  the 
discriminant  functions  by  the  square  roots  of  the  diagonal  of 
the  W  matrix  described  above.   The  scaled  vectors  show  the 
relative  contributions  of  the  input  variables  to  each  of  the 
discriminant  functions.   (Parameter  Number  5) 

15.   The  measures  of  significance  described  in  Section  III.   (Parameter 
Number  k) 

V.   RESTRICTIONS 

The  DISCRIMINANT  ANALYSIS  program  follow  the  program  name  on  the  main 
program  card.   Each  parameter  must  be  enclosed  in  parentheses.   The  parameters 
must  appear  in  the  order  given  below.   If  a  parameter  is  not  needed,  do  not 
punch  anything  between  its  parentheses.   All  parentheses  after  the  last 
non-empty  pair  may  be  omitted. 

Parameter 


Number  Use  or  Meaning 


Q   Output  address  of  discriminant  functions 
(Matrix  f..).   SEQUENTIAL  1-5  and/or 
PRINT.    1J (Needed  for  CLASSIFICATION). 

Number  of  variables 

Number  of  groups 

1  if  desire  significance  measures  printed. 


Parameter 
Number  Use  or  Meaning 

5  1  if  desire  scaled  discriminant  vectors 

printed. 

6  1  if  desire  intermediate  results  printed. 

T  Input  Address  of  T  matrix.   CARDS  or 

SEQUENTIAL  1-5-   (See  Section  IV ). 

8  Output  Address  of  group  means  on  original 

variable  and  sample  size  (printed  only).* 
SEQUENTIAL  1-5  and/or  PRINT.   (See  Parameter 
12)   (Needed  for  CLASSIFICATION). 

9  fi  Output  Address  of  group  dispersion  matrices 

of  original  variables.*  SEQUENTIAL  1-5 
and/or  PRINT.   (Needed  for  CLASSIFICATION). 

10  N  -  total  number  of  subjects  in  all  groups 

combined.   This  parameter  is  left  blank  if 
raw  data  is  input  rather  than  W  and  T 
matrices. 

11  0,   Output  Address  of  group  means  on  discriminant 

functions.   SEQUENTIAL  1-5  and/or  PRINT. 

12  Output  Address  of  group  sample  sizes.   Possibly 

needed  for  CLASSIFICATION. 

Input  addresses  of  raw  data  input  groups  are  listed  on  a  $INP  card.   The  . 
number  of  groups  is  limited  to  16.   See  examples  for  illustrations. 

*  If  W  and  T  are  input  instead  of  raw  data,  group  means  and  dispersion 
matrices  are  not  printed.   Means  and  dispersion  matrices  on  discriminant 
functions  are  not  computed  in  this  case. 

Q     It  is  possible  to  print  in  F  format  and/or  punch  the  output  from 

these  parameters.   If  you  need  either  of  these  options,  see  the  section  in 

the  INTRODUCTION  on  INPUT  and  OUTPUT. 

VII.   SPECIAL  COMMENTS 

1.  This  program  does  not  check  for  missing  data.   All  blank  spaces 
are  read  as  zeros. 

2.  The  user  is  cautioned  against  using  the  DISCRIMINANT  ANALYSIS 
program  without  an  understanding  of  the  statistical  technique 
used.   See  the  references  (Section  IX). 


VIII.   EXAMPLES 


/•ID 

//   EXEC   SOUP 

//SYSIN  DD   * 

MATRIX. 

MOV(C)(S5). 

END  P 

DIS(S1)(U0)(U)(1)()()()(32/P) 

$INP(C)(C)(S5)(C). 

END  SOUPAC 

DATA(UO)(1+OF2.0) 


END# 
/* 

In  this  example  four  groups  of  data  are  being  input  with  the  first 
two  groups  coming  from  cards,  the  third  from  temporary  storage  on  S5 
and  the  fourth  from  cards.   Discriminant  functions  are  stored  on 
SI  and  printed  and  group  means  are  stored  on  S2  and  printed. 
Significance  measures  are  calculated  for  Uo  variables  input. 


/•ID   (accounting  information) 

//  EXEC  SOUPAC 

//SYSIN  DD   * 

DIS(S1)(15)(2)()()()()(S2)(S3)()()(SU) 

$INP(C)(C). 

CLA(S1)(S2)(S3)(C)(P)(P)(2)(15)(D(SU) 

END  S 

DATA(15) 


END# 

'* 

This  illustrates  the  use  of  the  DISCRIMINANT  and  CLASSIFICATION  programs. 
The  DISCRIMINANT  program  will  save  discriminant  functions  on  SI;  it 
will  operate  on  15  variables  for  each  of  two  groups.   It  will  output 
group  means  on  S2  and  groups  dispersion  matrices  and  store  group  sample 
sizes  on  Sk. 

CLASSIFICATION  in  turn  will  read  discriminant  functions,  group  means, 
and  group  dispersion  matrices  from  SI,  S2,  and  S3  respectively.   It 
will  read  the  group  to  be  classified  from  cards  and  print  inverse  of 
dispersion  matrices  and  x  and  probability.   It  will  expect  two 
sets  of  group  means  and  dispersion  matrices  from  15  variables  but 
1  discriminant  function  (See  Section  IV).   Original  group  sample 
sizes  are  read  from  SU. 
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196l,  pp.  106-108. 
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ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS 

I.   General  Description 

The  ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS  program  calculates 
the  following : 

(l)   Residuals 

U  =  (Y30  hV 

where : 

(YX)  is  the  raw  data  matrix  of  endogenous  and  exogenous  variables 

ft" 

(2^   Durbin-Watson  statistic  each  equation (i' 


andjplis  the  matrix  of  coefficient  estimates. 


N  o 

Z  [Uj(t)  .  Uj(t-l)f 

t=2 

Z     [Ui(t)]2 
t=l 

(3)   Covariance  matrix  for  residuals 

W  - 


si}  (s)fr]' 


where : 

S  is  the  raw  data  covariance  matrix. 
(h)      Reduced  form  estimates 

«  -  -(3)"1(r) 

(5)   Reduced  form  predicted  values 

Y  =  Y* 

(6N   Reduced  form  residuals 

V  =  Y  -  Y 

(7^   Covariance  matrix  for  reduced  form  residuals 

(3-1)/w(r1) 


ECONOMETRIC  REDUCED  FORM  AND 
RESIDUAL  ANALYSIS 

Page  2 


References : 

Johnston,  J.,  Econometric  Methods,  New  York,  McGraw-Hill  Book 
Company,  Inc.,  i960. 

Goldberger,  Arthur  S.,  Econometric  Theory,  New  York,  John  Wiley 
and  Sons,  Inc.,  I96U. 

II.   Restrictions 

Only  those  inputs  used  in  the  calculations  called  for  need  be  given. 
They  must  be  in  the  following  formats : 

(1)  Coefficients: 

The  coefficient  matrix  for  K  equations  with  N  variables,  Nl 
exogenous  and  N2  endogenous,  must  be  a  K  by  N+l  matrix.   Each 
row  corresponds  to  an  equation.   The  first  element  in  each  row 
is  the  constant  term  followed  by  the  coefficients  matrix  (i.e.,  exog- 
enous coefficients   first;  endogenous  coefficients  next).   In 
each  row,  there  must  be  -1  which  corresponds  to  the  endogenous 
variable  that  was  normalized  on. 

(2)  Raw  Data: 


The  data  must  be  arranged  so  that  exogenous  variables  occur 
first  and  endogenous  variables  last.   (The  TRANSFORMATION  program 
may  be  used  to  arrange  data  in  this  way,  if  it  is  not  already 
like  this') . 

(3)   Raw  Data  Covariance  Matrix: 

The  covariance  matrix  must  have  the  following  form 


sample 
size 


(standard  deviations) 


X  (means) 


Covariance  (exogenous  first,  endogenous  last) 


Care  should  be  taken  to  see  that  an  input  address  is  specified  for  any  dats 
needed  in  calculating  the  desired  statistics  and  that  any  intermediate 
statistics  needed  are  stored  (i.e.,  an  output  address  besides  print  is 
specified).   The  following  list  indicates  which  previous  statistics  are 
needed  in  the  calculation  of  each  statistic. 
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1.  Residuals  -  coefficients  and  raw  data 

2.  Dubrin-Watson  statistic  -  coefficients  and  raw  data 

3.  Covariance  matrix  for  residuals  -  coefficients  and  raw 
data  covariance  matrix 

k.      Reduced  form  coefficients  -  coefficients 

5.  Reduced  form  predicted  values  and  residuals  -  reduced  form 
coefficients  (no  output  address)  and  raw  data 

6.  Covariance  matrix  of  reduced  form  residuals  -  reduced  form 
coefficients  and  covariance  matrix  of  orginal  residuals 

This  program  is  restricted  to  less  than  150  variables. 

Parameters 

The  parameters  appear  on  the  program  card  following  the  name  ECONO- 
METRIC in  the  following  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  for  coefficients.   SEQUENTIAL  1-15 

2  Input  Address  for  raw  data  covariance  matrix. 

SEQUENTIAL  1-15- 

3  Input  Address  for  raw  data.   SEQUENTIAL  1-15- 

(See  Special  Comments). 

k  Number  of  exogenous  variables  (total) . 

Output  Address  for  residuals.   SEQUENTIAL  1-15 
and/or  PRINT. 

6  Output  Address  for  covariance  matrix  of 

residuals.   SEQUENTIAL  1-15  and/or  PRINT. 

7  If  greater  than  0,  reduced  forms  are 

calculated  and  printed. 

8  If  greater  than  0,  reduced  form  predicted 

values  and  residuals  are  printed. 

9  If  greater  than  0,  covariance  matrix  for 

reduced  form  residuals  is  printed. 
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IV.   Special  Comments 

If  Parameter  Number  3  is  specified,  the  Durbin-Watson  statistic  will 
be  calculated  and  printed. 


J  any  •     ,  1970 
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FREQUENCY  COUNTING  AND 
MEASURES  OF  ASSOCIATION 


General  Description 

This  program  computes  tables  of  frequency  of  occurance  of  values  of 
the  input  variables  and  where  appropriate,  measures  of  association.   Input 
can  be  in  the  form  of  previously  computed  tables  or  as  raw  data.   Output 
is  printed  only  and  cannot  be  stored.   Counting  is  only  done  on  the  integer 
portion  of  the  input  value.  Negative  values  may  also  be  counted. 

A.   FREQUENCY  COUNTING 

Many  options  are  available  for  the  frequency  tables. 

1.  Either  one-  or  two-dimensional  tables  can  be  specified.  With 
one -dimensional  tables,  the  frequencies  of  one  variables  are 
listed.  With  two-dimensional  tables,  the  frequencies  of  the 
1st  variable  against  the  2nd  is  given  as  a  two-dimensional 
matrix. 

2.  Control  variables  may  be  used  which  enable  counting  to  be  done 
in  up  to  twelve  dimensions.  When  control  variables  are  used, 
the  data  must  be  presorted  on  the  control  variables.   Counting 
by  the  program  will  proceed  as  long  as  each  data  row  encountered 
has  the  same  value  for  each  control  variable  as  the  previous 
row.   If  any  control  variable  has  a  value  different  from  that 

in  the  previous  row,  a  new  table  is  started. 

3.  Maximum  and  minimum  values  may  be  given  for  each  variable.  When 
these  are  given,  an  additional  read  of  the  data  is  not  necessary. 
Since  reading  of  data  is  time  consuming,  it  is  advisable  to 
specify  maxima  and  minima  though  not  necessary.  Values  will  be 
ignored  in  the  frequency  counting  if  they  fall  either  below  the 
minimum  or  above  the  maximum. 

k.     For  each  cell  in  a  table  the  percentage  of  the  total,  of  the  row, 
and/or  of  the  column  may  be  printed  as  an  option. 

5-  A  weighting  variable  may  be  specified.  Without  a  weighting  vari- 
able, frequency  counts  are  advanced  by  one  for  each  occurance  of 
a  value.  When  a  weighting  variable  is  used  the  frequency  counts 
are  advanced  by  the  value  of  the  weighting  variable  for  each 
observation.   This  is  useful  when  data  samples  represent  larger 
populations  and  an  estimate  of  the  frequencies  of  the  larger 
populations  is  desired. 

Labels  can  be  given  for  variables  so  they  are  more  easily  identi- 
fied. Each  label  is  restricted  to  eight  characters. 

7-   Input  may  be  previously  computed  tables  from  which  measures  of 
association  can  be  directly  computed. 

Marginals  are  automatically  printed. 
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B.   MEASURES  OF  ASSOCIATION 

The  following  coefficients  are  calculated  and  printed  on  option  for 
two-way  tables: 

1.   Chi-square  and  related  coefficients 

Let:  n  =  total  population  of  the  table 

n   =  number  of  Vertical  classification  a  (column  a) 
and  Horizontal  classification  b  (row  b) 

na.  =  bnab 

n  ,  =  ^ab 
•  b   a 

cc  =   number  of  rows 
3  =  number  of  columns 

2 
(  ab    a.   .b) 
Then:   chi-square  =  ,  _         n 


na.  n.b/n 

adjusted  chi-square  (Yate's  correction  for  continuity)  for  2x2 
tables  only  = 

ZZ  ,n   -  n  n   -  1/2. 
,  {    ab    a  b    '    ) 

ab  -1 

n  n, 
a  p 

n 


/   chi-square/n  -.    ' 
1  +  chi-square/n 


T  =  ( 


chi-square 
(a-l)((3-l) 


/nl/2 


I  and  T  are  measures  of  contingency  and  can  be  looked  up  in  con- 
tingency tables.   The  maximum  expected  frequency  is  also  printed 

Lambda  coefficients 

Let :   n   =  Max  n  , 
am    ,    ab 
b 

n  ,  =  Max  n  , 
mb       ab 
a 

n 

•  b 
b 


n     Max  n 
m.        a. 
a 
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Z       Z 

a  nam  +  ^  nmb  "  n.m  -  nm. 


2n  -  n 

.m 

■  n 
m. 

Lambda  H  = 

Z  „ 

a  nam  "  n.m 

n  -  n 
m. 

Lambda  V  = 

Z 

b  nmb  ~  nm. 

n  "  nm. 

Lambda  coefficients  will  be  indeterminate  if  all  values  lie  in 
one  column  or  row. 

Lambda  H  can  be  defined  as  the  decrease  in  probability  of  error 
in  predicting  the  H-variable  when  knowledge  of  the  value  of  the 
V- variable  is  considered  as  opposed  to  random  guessing  of  the 
H-variable. 

Ninety-five  per  cent  confidence  limits  are  calculated  and  printed 
for  Lambda  H  and  Lambda  V  using  the  methods  discussed  by  Goodman 
and  Kruskal  in  their  second  article.   (See  references).   Lambda 
is  always  between  Lambda  H  and  Lambda  V. 

3-  Weighted  Lambda  Coefficients 

n  n  . 

y     am  ..   „  ab 

Z -  Max  Z  

a  n  n 

Weighted  Lambda  H  =  - — ** B — B — a-!- 

Max  Z     ab 
a  -    ,         — 

b     an 
a. 

n   -U  n    u 

z  JUL,  Max  Z  -^ 

b      .b  a     b      .b 


Weighted  Lembda  V  =  MaTT"nab 


B  -     a     b  n  , 

•  b 

These  are  Lambda  H  and  Lambda  V  calculated  using  weighted  quantities 

n  n 

l/a  -a—  and  1/3  — —  ,  respectively,  instead  of  n  ,. 
a.  .b 

::o  confidence  limits  are  provided  for  the  weighted  lambda  coef- 
ficients. 

Gamma  Coefficient 

Z  Z 

a'  b* 

Let:      PS  =         nat,    [a'>a  b>b    naib.] 
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Z  Z 

a'    b1 


PD  =  S  nab  ^'>a  b'>b  na'b^ 


„,    .       PS  -  PD 
Then  Gamma  =  pg  +  pp 

Ninety- five  per  cent  confidence  limits  for  Gamma  are  calculated 
using  the  method  outlined  and  preferred  in  the  second  article  by 
Goodman  and  Kruskal. 

C.   References 

These  coefficients  are  discussed  and  compared  by  Leo  A.  Goodman  and 
William  H.  Kruskal  in  their  article  "Measures  of  Association  for 
Cross  Classification",  American  Statistical  Association  Journal, 
December,  195^-  • 

The  Gamma  coefficient  is  their  suggested  measure. 

The  C  coefficient  was  first  suggested  by  Karl  Pearson  and  the  T 
coefficient  is  due  to  Tchuprow. 

The  Lambda  coefficients  apparently  were  first  suggested  by  Louis 
Guttman  ("The  Predication  of  Personal  Adjustment",  Bulletin  kQ, 
Social  Science  Research  Council,  New  York,  19^-1 )  • 

The  development  of  the  approximate  sampling  theory  and  of  the 
machinery  for  calculating  the  confidence  intervals  for  Lambda  and 
Gamma  was  done  in  a  sequel  article  by  Goodman  and  Kruskal:  "Measures 
of  Association  for  Cross  Classification  III;  Approximate  Sampling 
Theory",  American  Statistical  Association  Journal,  June,  1963- 

The  statistics  that  are  requested  will  be  printed  immediately 
following  each  table. 

II.   Restrictions 

The  program  is  limited  to  ^00  variables  and  1000  tables.   Tables  are 
restricted  to  ^0000  cells;  if  a  table  larger  than  this  is  requested,  a 
message  is  written  that  the  table  is  too  large.  As  many  tables  as  can 
be  fitted  into  i+OOOO  cells  are  generated  in  each  read  of  the  data.   Card 
input  is  allowed  if  the  tables  fit  in  U0000  cells.   If  the  tables  do  not 
fit  in  40000  cells  then  if  maxima  and  minima  are  not  specified,  the  data 
will  be  transferred  to  a  disk  inside  the  frequency  program  so  a  reread 
will  be  possible  and  card  input  allowed. 

Parameters 

A.   Main  Parameter  Card 

Immediately  following  the  program  name  FREQUENCY  (mnemonic:  FEE),  the 
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following  parameters  are  listed,  each  enclosed  in  parentheses  with  a 
period  after  the  last  parameter  used: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address. 

2  0  -  ignore  blanks 

1  -  count  blanks  separately 

2  -  count  blanks  as  zeroes 

_  0  -  normal  spacing 

3  Spacing  ..       ,  ,  /    & 

1  -  one  table  per  page 

k  Address  of  labels. 

Variable  number  of  weight  variable. 

r  „.     „  .      0  -  raw  data 

6  Type  of  input       ,       .   ..     .     _ 

n  -  where  n  is  the  number  of 

previously  computed  tables. 

If  n  >  1,  then  input  must  be 

from  cards,  and  each  table  is 

a  separate  data  deck. 

If  both  parameters  1  and  h   are  cards,  the  labels  must  precede  data. 

B.   Subparameters 

Subparameters  follow  the  main  parameter  card  and  can  be  in  any  order. 
A  period  must  follow  each  subparameter  statement  though  the  statement 
can  be  continued  on  more  than  one  card.   If  the  subparameter  statement 
is  left  out,  the  option  is  not  used.   In  the  following  explanation 
I  =  integer  and  F  =  real  number. 

Mnemonic  Use  or  Meaning 

PER(l) (i) (i) .  Per  cents  are  requested  , 

1  -  yes 


lst  integer  =  total  per  cent 
2nd  integer  =  row  per  cent 
3r     integer  =  column  per  cent 


MIN*F**F* Minimum  and  maximum  are  given.   The 

MAX*F*"*F* last  value  is  propagated  to  any 

remaining  variables.   Data  will  be 
reread  if  either  MIN  or  MAX  is  missing. 
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Mnemonic 

MEA(I)(I)(I)(I). 
Only  applicable  to 
2 -way  tables 


CONTROL(l)(l^ 

ONE(l,I,lUl,I,l) 
TW0(l,I,l)(l,I,l) 


Use  or  Meaning 
Measures  are  requested 


0  - 

1  - 


no 
yes 


1st  integer  =  X   (with  a  code  of 

2  both  X  and  a  table  of  expected 
frequencies  will  be  printed) 

2nd-   integer  =  X   (lambda) 

3   integer  =  weighted  X 

k^1     integer  =  y    (gamma  "^ 


Up  to  10  control  variables  are  allowed. 
The  I ' s  should  be  the  variable  numbers 
of  the  control  variables. 

One  and  only  one  of  these  two  must  be 
...       in  every  program.   ONE  means  one-way 
tables.   TWO  means  two-way  tables. 
In  ONE,  (l,I,l)  specifies  one  range  of 
tables.   In  TWO,  (l,I, I) (l,I, I)  specifies 
one  range  of  2-dimensional  tables. 

The  notation  (i,  1, 1^1  has  the  following  meaning:   If  it  is  absent  com- 
pletely, i.e.,  ONE.  or  TWO.  then  all  possible  tables  are  calculated. 
The  first  integer  is  the  initial  value,  the  second  is  the  terminal 
value  and  the  third  is  the  increment.   It  means:   take  all  values 
starting  at  the  first  integer  and  stepping  by  the  third  integer  until 
you  reach  the  second  integer.   If  the  third  integer  is  missing,  the 
increment  is  taken  to  be  one.   If  the  second  is.  also  missing,  then 
the  first  is  taken  as  a  single  table  specification.   As  many  as 
wanted  can  be  specified  subject  to  the  following  restrictions:   In 
the  two-way  tables,  no  more  than  5^-0  separate  ranges,  i.e.,  (i,  I,  I)  (l,I,j 
can  be  specified. 


IV.   Labels 


Labels  can  come  from  cards  or  temporary  storage.   Each  label  should 
be  treated  as  if  it  were  two  variables  each  k   characters  long.   For 
example,  if  there  are  6  variables  in  the  input  data,  then  there  would  be 
twelve  variables  for  labels  and  the  data  card  would  be    DATA(l2) (12A^) . 

All  the  labels  are  treated  as  one  row  of  input  n  variables  long. 
Labels  need  not  be  given  for  each  variable  but  if  a  variable  is  skipped 
and  more  labels  follow,  then  it  should  be  replaced  with  eight  blanks. 


V.   Examples 

1.   FRE(C). 
PER(l). 

C0NTR0L(1)(3). 
0NE(2,6,2)(9). 
END  P 

p .      rom  car      c   cent  of  totals  will  be  printed;  control  variables 

tables  are  2,  h,   6,  and  9.   Blanks  will  be  ignored. 
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2.   FRE(C)(2)(1)(C)(2). 
PER(l) (1) (1) . 
3WO(3)(*0(2,6,2)(1,5,2). 

END  P 

All  per  cents  will  be  given;  2  will  be  the  weighting  variable;  resulting 
tables  will  be  3  vs  U;  2  vs  1;  2  vs  3;  2  vs  5;  k   vs  1 ;  h   vs  3;  k   vs  5;  6  vs  1; 
6  vs  3)  6  vs  5.   Tables  will  be  printed  one  per  page  and  blanks  will  be 
counted  as  zeroes. 

Since  both  labels  and  data  are  on  cards  the  deck  will  look  like  this: 
FRE(C) 


END  S 

DATA(n)(nAU) 

label  for  first  labeled  variable 

END# 

DATA(n/2)( ) 


■label  for  last  labeled  variable 


END# 

3.   FRE(S1)(1). 
TWO. 

MEA(2)(1)(1)(1). 
END  P 

All  possible  two-way  tables  will  be  calculated;  all  four  measures  will  be 
calculated  and  the  table  of  expected  frequencies  will  be  printed.  Blanks 
will  be  counted  separately. 

k.      FRE(C)()()()()(2). 
MEA(l)()(r. 
END  P 

p 
Input  is  in  the  form  of  two  previously  computed  tables.  X  and  weighted  X 

will  be  calculated. 

Since  there  are  two  tables  the  deck  will  look  like  this: 
FRE(C) 

END  S 

DATA 

• 

END# 

DATA 


END# 
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SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 


ITERATIVE  FACTOR  ANALYSIS 


I.   General  Description 

A.   Procedural 

This  routine,  upon  option,  provides  one  of  four  iterative  factor- 
ization methods: 

1.  Alpha  factor  analysis  (AFA,  Kaiser,  1962) 

2.  Canonical  factor  analysis  (CFA,  Rao,  1955,  Harris,  19^2) 

3-   Stepwise  maximum  likelihood  factor  analysis  (MLFA,  Lawley,  19^-0) 
k.      Iterative  principal  axis  factor  solution  (iPRAX,  Traditional) 

All  four  methods  have  in  common  that  communalities  and  factor  loadings 
are  estimated  simultaneously.   In  three  cases  (AFA,  CFA,  IPRAX)  the 
number  of  factors  decision  can  be  made  beforehand  by  the  user,  or  it 
can  be  left  to  the  program,  in  which  case  appropriate  modifications  of 
Guttman's  lower  bound  criterion  will  be  used. 

The  four  methods  differ  from  each  other  in  theory  with  respect  to  the 
defining  criterion  of  optimization,  and  consequently  they  differ  tech- 
nically with  respect  to  the  matrix  that  is  diagonalized  in  each  case. 

1.   AFA  (Kaiser) 

Optimization  criterion:  maximize  the  alpha- reliabilities 
(Cronbach)  of  the  retained  factors.   If  the  number  of  factors 
decision  is  left  to  the  program,  the  Kaiser  modification  of 
the  Guttman  criterion  will  be  used  and  all  factors  with 
positive  alpha-reliability  will  be  iterated  upon. 

The  diagonalization  is  on  the  matrix  C  in 

C  =  H"1  (R  -  U2)  H"1    so  that  C  =  Q0Q' 

o 
where  R  is  an  nxn  input  matrix  of  covariances,  H  is  a  diagonal 

matrix  of  communalities,  U2  =  I  -  H2  is  a  diagonal  matrix  of 

uniquenesses,  Q,  is  an  nxm  matrix  of  latent  vectors  corresponding 

to  the  m  largest  latent  roots  in  6   which  are  used  to  recompute 

new  estimates  of  H2  through  F  =  H  Q  01/2.  An  initial  set  of  H2 

is  provided  by  I  -  (diag(R"^-)  )~1  which  is  equivalent  to  the 

squared  multiple  correlations  of  R  if  R  itself  is  a  correlation 

matrix. 

Invariance  under  scaling:  Kaiser  has  shown  that  the  resulting 
factors  will  be  invariant  under  scaling,  i.e.,  if  a  covariance 
matrix  R  gives  rise  to  a  factor  matrix  F  then  the  covariance  matrix 
SRS  will  give  rise  to  a  factor  matrix  SF  (S  diagonal). 

Behavior  of  latent  roots,  alpah  reliabilities:   the  n-m  rejected 
roots  of  C  add  to  zero  at  each  state  (i.e.,  C  is  non-Gramian), 
the  m  accepted  roots  are  simple  functions  of  the  alpha-reliabilities 
of  the  retained  factors  in  F.   These  reliabilities  will  be  output 
by  this  sub -program. 
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2.   CFA  (Rao,  Harris) 

Optimization  criterion:   maximize  the  correlations  between  m 

linear  combination  of  the  common  parts  of  the  variables  with  m 

linear  factors  that  are  canonically  correlated  (Hotelling)  with 

the  variables  in  the  common  factor  space.   If  the  number  of 

factors  decision  is  left  to  the  program,  the  Harris  modification 

of  the  Guttman  criterion  will  be  used,  leading  to  a  Gramian 

P 
R-U  of  minimum  rank. 

The  diagonalization  is  on  the  matrix 

C  =  U"1  (R  -  U2)  U"1    so  that  C  =  Q9Q' 

where  F  -  UQ6r7   is  used  to  recompute  new  estimates  for  U2, 
retaining  the  m  largest  roots  of  C  in  Q.      The  notation  is  the 
same  as  in  section  1  (AFA) .   An  initial  set  of  U2  is  provided  by 
[diag  (R-1)]"1. 

Invariance  under  scaling:   the  resulting  factors  are  again  invariant 
under  scaling  as  defined  in  section  1  (AFA^1  . 

Behavior  of  latent  roots:   Chi- square  criterion:   Rao  has  shown 
that  the  n-m  rejected  roots  approach  unit  at  convergence.   For 
exact  rank  m  data  they  will  be  "exactly"  unity  within  the  tol- 
erance of  the  convergence  criterion  ETA  (see  section  III  -  B) . 
For  data  containing  random  error  their  departure  from  unity 
provides  a  likelihood  ratio  test  for  the  hypothesis  that  the 
population  matrix  P-V2 =  GG'  ,  where  P,  V,  G  are  population 
parameters  corresponding  to  R,  U,  F  in  the  sample,  is  rank  m 
or  less.   A  criterion  for  this  test  is  computed  by  this  sub- 
program which  can  be  compared  with  two  chi- square  approximations 
which  are  also  output  by  this  sub-program.   Note  that  such  a 
chi-square  test  is  valid  only  if  the  iterative  process  has  indeed 
converged,  as  indicated  by  the  maximal  discrepancy  between  trial 
vectors  which  is  printed  out  for  that  purpose. 

3-   MLFA  (Lawley) 

The  CFA  variant  of  the  program  can  be  used  for  a  step-wise  maximum 
likelihood  factorization  in  the  Lawley-Rao  sense. 

Optimization  criterion:   maximize  the  likelihood  function  corre- 
sponding to  the  multivariate  normal  distribution  with  covariance 
matrix  parameters  P=GG'  +  V2  (as  defined  in  section  2,  CFA),  given 
the  sample  matrix  R,  under  choice  of  G  and  ¥  and  observing  the 
r,ide-conditions  that  P  - V^  is  Gramian  and  V^  diagonal  with  0<  v-j_  <  1  . 

The  diagonali zat ion  is  the  same  as  in  CFA,  section  2,  hence,  the 
resulting  factors  are  again  invariant  under  scaling  as  defined  in 
section  1  (AFA) . 
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In  contrast  to  CFA,  however,  the  number  of  factors  decision  is 
made  on  statistical  ground.   The  user  would  start  with  a  reason- 
able guess  for  m  (preferably  m<  n/2  to  ensure  positive  degrees 
of  freedom  for  the  chi- square  test,  which  otherwise  will  be  by- 
passed) .   After  convergence  has  been  obtained,  the  user  would 
insepct  the  chi-squared  statistic.   If  the  statistic  is  below  the 
table  values  of  the  chosen  porbability  level  (.05  or  .01),  then 
the  hypothesis  can  be  accepted  at  this  level  with  corresponding 
risk  and  the  user  has  the  option  to  reduce  m  for  a  second  run, 
etc.   On  the  other  hand,  if  the  adjusted  statistic  exceeds  the 
table  value,  then  m  must  be  raised  until  the  adjusted  statistic 
warrants  acceptance  of  the  hypotheses. 

Within  the  package  the  user  is  free  to  re-enter  the  routine 
repeatedly  with  sequentially  de-  or  increasing  m  specified  on 
the  call  card.   Since  the  test  assumes  convergence,  it  is  per- 
tinent that  the  number  of  iterations  be  allowed  large  enough 
for  convergence  to  occur  within  the  chosen  tolerance  bound  ETA 
(see  section  III  -  B) . 

k.      IPRAX  (traditional) 

Optimization  criterion :   none 

The  diagonaliza-tion  is  on  the  matirx 

C  =  R  -  U2    so  that  C  =  Q0Q' 

where  F  =  QsV^  is  used  to  recompute  H  =  I  -  U2,  retaining 
the  m  largest  roots  in  0.   The  notation  is  the  same  as  in 
section  1  (AFA) .   An  initial  set  of  H2  is  provided  by  the 
identity  matrix.   If  the  number  of  factors  decision  is  left 
to  the  program,  the  unmodified  Guttman  criterion  will  be  used, 
i.e.,  all  factors  corresponding  to  roots  of  the  input  matrix 
R  which  exceed  unity  will  be  retained. 

Invariance  under  scaling:   as  defined  in  section  1  (AFA)  is 
not  obtained  by  this  method. 

The  behavior  of  the  latent  roots  is  not  known  at  present.  No 
statistical  or  other  significance  can  be  attached  to  the  m 
largest  or  n-m  smallest  root  of  C. 

5-   Both  covariance  matrices  and  correlations  matrices  are  accept- 
able as  input.   If  covariances  are  used  the  tenth  parameter 
should  be  1.   In  this  case  the  covariance  matrix  is  scaled  into 
a  correlation  matrix,  and  all  computations,  in  particular  the 
number  of  factors  decision,  are  based  on  this  correlation  matrix. 
At  the  final  stage  the  factors  are  scaled  back  so  as  to  account 
for  the  covariance  matrix  which  was  input.   The  matrix  of  residuals 
is  computed  in  the  metric  of  the  covariances. 
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II.   Restrictions 

Input  is  restricted  to  matrices  of  order  100  x  100  or  less.   Up  to 
50  factors  can  be  handled  by  this  program.   If  the  number  of  factors 
decision  is  left  to  the  program  and  more  than  50  factors  are  estimated, 
an  appropriate  message  will  be  printed  out  and  control  will  be  returned 
to  the  system. 

Ill .   Parameters 

The  program  name  is  ITERATIVE  FACTOR  ANALYSIS.   After  the  name  on 
the  call  card  the  parameters  must  appear  in  the  following  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  of  data  matrix.   CARDS  or 

SEQUENTIAL  1-15 . 

2  Output  Address  of  data  matrix.   SEQUENTIAL  1-15 

and/or  PRINT. 

3  Output  Address  for  principal  axis  factors. 

SEQUENTIAL  1-15  and/or  PRINT. 


k  Output  Address  of  residual  matrix.   SEQUENTIAL  1-1 

and/or  PRINT. 

5  Option  code    0  if  IPRAX 

1  if  ALPHA 

2  if  CANONICAL 

3  if  STEP-WISE  MAXIMUM  LIKELIHOOD 
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Parameter 
Number  Use  or  Meaning 

6  Maximum  number  of  cycles  to  be  executed. 

If  left  blank,  50  cycles  will  be  used  as 
upper  limit. 

7  Number  of  factors  to  be  extracted.   If  left 

blank,  all  factors  with  roots  exceeding  unity 
will  be  retained. 

8  Exponent  of  convergence,  n,  where  tolerance 

ETA  =  10~n.   If  left  blank  n  =  3  or  ETA  =  10-3. 
If  all  goes  well,  the  program  will  stop  as  soon 
as  either  one  of  the  stopping  criteria  is  met. 
Error  stops,  if  they  occur,  are  labelled 
accordingly. 

9  Sample  size  (for  CFA  only).   If  left  blank, 

the  chi-square  computations  are  by-passed. 

If  specified,  chi-square  is  computed  with  the 
sampel  size. 

10  1  if  input  matrix  was  a  covariance  matrix. 


A.  Output  common  to  all  four  sub -programs 

1.  Matrix  output  within  system  conventions: 

a.  R  (input  covariance  matrix^ 

b.  F  (factor  matrix) 

c.  R-FF'  (residual  matrix1) 

2.  Vector  output,  print  only: 

a.  communality  vector  (last  iteration) 

b.  vector  of  latent  roots  of  C  (last  iteration) 

3-   Constant,  print  only: 

a.  number  of  iterations  completed 

b.  largest  discrepancy  between  trial  vectors 
(H.2,  U  ,  H-^,  depending  on  sub-program ^ 

c   root  mean  square  of  off-diagonal  residual  matrix 

d.  per  cent  of  variance  removed 

B.  Additional  output  specific  to  sub-programs 

AFA:   The  alpha-reliabilities  of  the  m  retained  factors 

CFA:   chi-square  statistic,  chi-square  appriximations  (Wilson, 
Hilferty)  for  p  =  .05  and  p  =  .01,  for  comparison  with 
statistic.   Degrees  of  freedom. 
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IV.   Special  Comments 


The  accuracy  should  be  approximately  6  digits  in  computations, 
possibly  somewhat  lower  for  a  very  large  number  of  iterations.   The 
effective  accuracy  depends  on  the  chosen  tolerance  ETA  and  the  actual 
convergence  as  indicated  by  the  largest  discrepancy  between  trial 
vectors.   The  chi-square  approximations  are  within  2  x  10"^  for  more 
than  8  degrees  of  freedom. 


October  Ik,    1969 
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I.   General  Description 

This  program  calculates  eigenvalues  and  eigenvectors  of  a  square, 
symmetric  matrix,  using  the  JACOBI  rotating  technique.   This  program  is 
limited  to  matrices  of  110  rows  and  110  columns.   The  user  should  realize 
that  this  technique  is  extremely  slow  on  large  matrices,  while  the  program 
takes  no  longer  than  PRINCIPAL  AXIS  FACTOR  ANALYSIS  for  small  matrices 
(up  to  20  x  20). 

II.   Parameters 


Parameter 
Number 


Use  or  Meaning 


Input  Address  of  correlation  matrix.   CARDS 
or  SEQUENTIAL  1-15. 

Output  Address  of  eigenvectors. 

Output  Address  of  principal  axis  factor. 

Output  Address  of  eigenvalues,  stored  as  a  row 
vector.   PRINT  is  not  valid.   The  eigenvalues 
are  always  printed. 

Number  of  eigenvectors  (or  factors)  to  be  out- 
put. 


III.   Special  Comments 

The  eigenvalues  are  stored  in  descending  algebraic  order  (from  largest 
to  smallest),  and  the  eigenvectors  and  factors  are  placed  in  the  same  order. 


IV.  Reference 

Ralston,  A.  and  Wilf ,  H.  S. :   Mathematical  Methods  for  Digital  Computers, 
John  Wiley  and  Sons ,  New  York ,  196^+ . 


January  22,  1970 

S0UPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 


K  CLASS  ESTIMATION 

I.    GENERAL  DESCRIPTION 

The  KIClass  and  K2Class  programs  work  together  to  provide  the  user 
with  the  general  K  Class  estimation  techniques  for  a  system  for  simultaneous 
equations.   In  particular,  this  includes  Ordinary  Least  Squares,  2-Stage 
Least  Squares,  and  Limited  Information  Estimators. 

KIClass  computes  and  prints: 
_    EX. 
Mean:   X±  =  — 

n                              0      NE  (X.X.)  -  (EX.)  (EX.) 
Covariance:   S   =  i_j 1     j 

N  (N) 
1/2 


Standard  Deviation 


:  S.  =  (S±i) 


K2Class  calculates  and  prints  from  the  covariance  matrix  computed 
by  KIClass 

(1)  The  estimated  coefficients 

(2)  The  standard  error  of  estimate 

(3)  The  standard  error  of  the  estimated  coefficients 
(h)  The  intercept  term 

(5)  The  covariance  matrix  of  coefficients 
In  addition  for  ordinary  least  squares 
(k=0)  K2Class  prints 

(6)  R  -  Squared 

(7)  Total  sum  of  squares 

(8)  Regression  sum  of  squares 

(9)  Error  sum  of  squares 

Residuals  and  Reduced  forms  can  he  obtained  directly  from  the  Econometric 
Reduced  Form  and  Residual  Analysis  program. 

References : 

Johnston,  J.,  Econometric  Methods,  New  York,  McGraw-Hill  Book  Company, 
Inc.,  196C~ 

II.   RESTRICTIONS 

A.   The  programs  are  limited  to  135  variables,  except  for  LIE  which  is 
limited  to  95- 
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B.  KIClass  accepts  data  from  cards  or  intermediate  storage  either  as 
raw  data,  cross-products,  or  covariance.   If  cross-products  or 
covariance  are  used  as  input  to  KIClass,  the  matrix  must  be  in 
the  following  order : 

Cross-products  Covariance 


N 


Zx 


Ex 


cross- 
products 


X 


covariance 


Input  to  K2Class  must  come  from  KIClass.   However,  once  data  has 
been  processed  by  KIClass,  it  may  be  used  by  K2Class  any  number 
of  times. 

Variables  must  be  ordered;  exogenous,  endogenous,  (independent, 
dependent) . 


III.   PARAMETERS 

A.   Main  Parameter 


The  parameters  for  PClciass  appear  on  the  program  call  card  in  the 
following  order : 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS,  SEQUENTIAL  1-15- 

2  Output  address  for  raw  data  covariance  matrix. 

SEQUENTIAL  1-15- 

3  Type  of  Input  0  =  raw  data 

1  =  cross-products 

2  =  covariance 

k  Number  of  equations  of  LIE;  0  otherwise 

5  Total  number  of  exogenous  variables  in  LIE  system; 

1  <_  n  <_  9I+. 

6  1  if  want  covariance  matrix  printed. 

7  1  if  want  cross-products  matrix  printed. 

3  Output  Address  for  eigenvalues.   SEQUENTIAL  1-15- 

The  following  parameters  for  K2Class  appear  on  the  program  call 
card  in  the  following  order: 
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Parameter 
Number  Use  or  Meaning 

1  Input  Address  same  as  output  address  for  KIClass 

2  Output  Address  for  estimated  coefficients. 

SEQUENTIAL  1-15- 

3  Scratch  Address 

SEQUENTIAL  1-15. 

h  Input  Address  for  Eigenvalues  same  as 

output  address  in  KIClass. 

5  Number  of  exogenous  variables  in  the  system. 

6  Number  of  equations. 

7  Floating  point  value  of  k.   (See  Special 

Comments.)   This  value  should  be  enclosed  in 
asterisks. 

B  Sub-parameters 

For  all  K2Class,  and  for  KIClass  when  LIE  is  desires,  a  sub-parameter 
card  for  each  equation  must  follow  the  main  parameter  card.   If 
subparameters  appear  for  both  programs,  they  must  appear  in  the  same 
order. 

Each  of  these  cards  has  the  following  form: 

Parameter 
Number  Use  or  Meaning 

1  Number  of  exogenous  variables  in  the 

equation. 

2  Number  of  endogenous  variables  in  the 

equation. 

3  The  variable  number  of  all  variables  in  the 

equation  in  the  order: 

1  -  exogenous  in  the  equation 

2  -  endogenous  in  the  equation  with 

variable  standardized  on  last. 

IV.   SPECIAL  COMMENTS 

If  k  =  *0*  Ordinary  Least  Squares  Estimates  are  computed. 
If  k  =  *1*  2-stage  Least  Squares  Estimates  are  computed. 
If  k  =  *-l*  Limited  Information  Estimates  are  computed. 

When  LIE  is  desired,  the  output  and  input  addresses  for  eigenvalues 
must  be  specified. 


LINEAR  PROGRAMMING 


General  Description 

LINEAR  PROGRAMMING  maximizes  or  minimizes  a  linear  function  subject  to 
certain  linear  inequalities  called  constraints. 

In  matrix  notation: 

Find  the  solution  to 

AX  <,  =,  >b   (a  system  of  linear  equations  or  inequalities) 
which  maximizes  (or  minimizes) 

Z  =  CX 
where  X  >  0 

A  is  the  matrix  of  coefficients  of  the  constraints,  X  the  vector  of  vari- 
ables, C  the  vector  of  costs  or  profits  associated  with  e ach  variable,  and  b 
a  vector  or  matrix  of  non-negative  constants  which  places  a  bound  on  the  linear 
equations. 

The  equations,  AX  <,  -,  >b  in  n  variables  define  and  bound  a  space  called 
the  feasible  space  in  which  all  allowable  values  of  the  n  variables  are  defined. 
The  SIMPLEX  criterion  finds  those  combinations  of  variables  which  optimize  the 
objective  function  within  this  feasible  space.   To  solve  the  system  of  linear 
equations  defined  above,  the  inequalities  must  be  changed  to  equalities.   This 
is  accomplished  by  addition  of  surplus  variables  to  "greater  than"  constraints, 
and  slack  variables  to  "less  than"  constraints.   To  create  the  basis  for  solving 
a  system  of  linear  equations,  an  identity  matrix  must  be  formed  and  augmented 
to  the  A  matrix  of  structural  variables.   Creation  of  the  identity  matrix  is 
completed  by  addition  of  artificial  variables  to  constraints  with  a  "greater 
than"  relational  operator.   The  program  adds  any  needed  variables. 

Since  there  are  more  variables  (structural  +  surplus  +  slack  +  artificial) 
than  rows,  some  method  must  select  which  variables  will  be  in  solution.   The 
SIMPLEX  Algorithm  selects  a  number  of  variables  (equal  to  the  number  of  rows) 
which  will  be  in  solution.   The  final  solution  is  the  maximum  (or  minimum)  of 
the  linear  function  subject  to  the  constraints.   Since  slack  and  surplus  vari- 
ables have  "real"  meaning,  they  may  appear  in  the  final  and  intermediate  solu- 
tions.  Their  presence  as  a  non-zero  value  indicates  that  the  constraint  to 
which  they  were  added  is  not  binding.   Artificial  variables  have  no  "real" 
meaning.   Presence  of  artificial  variables  in  solution  indicates  that  some  con- 
straints are  so  constructed  as  to  preclude  a  solution  which  has  "real"  meaning. 

The  slack  and  surplus  variables  are  given  costs  of  zero  in  the  objective 
function.  Artificial  variables  are  given  large  negative  costs.   SIMPLEX  attempts 
to  drive  artificial  variables  from  solution. 

Failure  to  drive  artificial  variables  from  solution  may  indicate  a  problem 
in  which  constraints  are  mutually  exclusive  or  that  the  cost  assigned  to  the 
artificial  variable  is  not  large  enough. 
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In  matrix  notation  the  augmented  matrix  before  calculations  begin  would 


appear  as:   i   i 

A  I  S   =  b 


where  I  is  the  identity  matrix  of  slack  and  artificial  variables  and  S  is  the 
mtrix  of  surplus  variables.   Row  operations  are  performed  on  the  augmented 
matrix  accorSng  to  the  SIMPLEX  criterion.   After  any  number  of  row  operations, 
the  inverse  matrix  of  the  original  coefficients  of  structural  variables  now 
in  solution  is  contained  in  the  columns  where  the  original  identity  matrix  was 
iocSed.   At  every  stage  (row  operation)  an  identity  matrix  will  be  present. 
This  identity  matrix  indicates  the  variables  in  solution. 

Since  the  original  table  is  stored  by  the  program,  it  is  possible  to  com- 
pare Se  results  of  the  inverse  obtained  through  LINEAR  PROGRAMMING  with  the 
Averse  obtained  by  a  standard  inversion  technique.   The  user  may  set  the  absolute 
value  for  this  comparison  in  Parameter  3-   If  the  comparison  does  not  meet  the 
Iccuracv  requirement,  a  new  table  is  formed  using  the  original  table  and  the 
calculated  Inverse.   After  a  feasibility  check,  the  program  continues  calculations 
until  an  acceptable  solution  is  obtained. 

References 

Llewellyn,  R.   LINEAR  PROGRAMMING.   New  York,  New  York:   Holt,  Rinehart, 

and  Winston,  1966. 
Hadley,  G.   LINEAR  PROGRAMMING.   Reading  Massachusetts :   Addison-Wesley, 

1963. 

Restrictions 

-,  •  !+Q,,  +.~  Q  mo-v-irmim  of  90  rows  or  constraints,  300  columns 
The  program  is  limited  to  a  maximum  01  y^  x^wo 

or  variables,  and  5  columns  in  the  requirement  matrix. 

These  limits  are  internal  limits  and  the  user  is  warned  that  large  problems 
may  excS  tTprograrn  capacity  during  accuracy  ^^^^^^^ 
multiple  column  requirement  matrices.   Program  capacity  WILL  be  exceeded  it . 
tnt  nutberof^ons^raints  +  number  of  structural  variables  +  number  of  greater 
than"  inequalities  >  300. 

Input  may  come  ONLY  from  CARDS  in  the  form  of  subparameters. 

Parameters 

All  floating  point  numbers  (indicated  by  FP)  must  be  enclosed  by  a  pair  of 
Ml  Tnteaer  numbers  (indicated  by  IN  must  be  enclosed  m  parenthesa 
Se  maS "all  to  th^Progrt  Ld  each  subparameter  must  be  terminated  by  a  perro, 

(•)'• 

The  p ;rai»  Ls  entered  by  punching  the  symbols  L-P  followed  by  the  approprf 

ate  main  parameters  and  subparameters.  All  main  parameters  have  default  options 
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Main  Parameters  to  follow  L-P 


1  Cost  of  artificial  variables  *large  negative  FP  numbers*. 
Default  =  -1.E50. 

2  Minimum  value  for  calculations  *FP*.   If  any  calculation 
falls  below  this  value,  it  is  set  to  zero.   Default  = 
internal  calculations. 

Value  for  accuracy  check  *FP*.   If  absolute  value  for  calcu- 
lated difference  (See  General  Description)  falls  below  this 
value,  final  value  is  termed  inaccurate  and  calculations  are 
performed  to  correct  rounding  errors.   Default  =  .5  . 

h         If  1,  suppress  print  of  solution  matrix  (IN). 

5  If  1,  suppress  print  of  check  matrix  (IN). 

6  Print  every  IN™  step,  i.e.  row  operation  (IN) . 

7  If  1,  insert  small  positive,  non-zero  number  for  any  zero  in  the  b 

vector.   Useful  aid  if  b  vector  contains  many  zeros. 

Subparameter 

rne  program  now  expects  to  find  the  word  MINimize  or  MAXimize  followed 

by  a  string  of  constants  which  represent,  in  sequential  order,  the  cost  or 
values  associated  with  each  variable.  All  non-zero  constants  (with  or  without 
decimal)  must  be  enclosed  by  a  pair  of  asterisks.   Zeros  may  be  enclosed  by 
asterisks.   A  series  of  sequential  zeros  may  be  represented  by  a  pair  of  paren- 
theses, i.e.  the  integer  number  in  the  pair  of  parentheses  represents  the  number 
of  sequential  zeros  to  be  inserted.   All  coefficients  must  appear  and  be  in 
sequential  order. 

The  cost  coefficients  representing  the  objective  function  are  terminated  by 
a  period.   The  constraints  are  entered  in  a  similar  manner.  All  variables  must 
be  in  sequence.   Coefficients  of  zero  must  be  included.  Multiple  requirement 
vectors  are  entered  in  the  standard  form.   The  constraint  is  terminated  by  a  per- 
iod.  Comments  which  do  not  include  period  (.),  comma  (,),  asterisks  (*)  or  left 
parenthesis  may  be  entered  at  any  point  outside  those  characters  delimiting  con- 
stants.  The  requirement  vectors  are  separated  from  the  rest  of  the  constraint 
by  relational  operators.   All  coefficients  must  appear  and  be  in  sequential  order. 

The  program  recognizes  three  relational  operators:   LE  (less  than  or  equal), 
EQ  (equal),  and  GE  (greater  than  or  equal).   These  relational  operators  are 
surrounded  by  quotes  (").   See  Section  V.   Examples  in  this  program. 

Output 

The  output  consists  of  the  objective  function,  the  final  solution  matrix, 
the  variables  in  solution,  and  the  optimal  functional  value.   In  addition, 
Shadow  Prices  or  opportunity  costs  are  printed.   Shadow  Prices  provide  useful 
information  on  the  "cost"  of  having  certain  constraints,  or  the  increased  pro- 
fit to  be  obtained  by  'relaxing'  a  particular  constraint. 
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For  example : 

Constraint  1:   lX(l)  +  2X(3)  <  5-0 
To  this  constraint,  slack  variable  X(l)  is  added  to  make  it  an  equality.   In 
the  final  solution,  X(l)  is  not  in  solution.   The  optimal  maximum  functional 
value  is  20.   The  'Shadow  Price'  on  variable  X(l  is  2.0.   This  means  that  if 
we  relax  this  constraint  to  6.0,  the  optimal  maximum  value  could  be  ^.0.   For 
every  unit  the  constraint  is  relaxed,  the  functional  value  will  be  changed  by 
the  Shadow  Price.   The  Shadow  Price  holds  until  the  constraint  is  no  longer 
binding.   The  same  logic  may  be  applied  to  "GE"  type^onstraints  with  surplus 
variables.   For  interpretation  of  Shadow  Price  fo r  structural  ^*  ^^^n 
variables,  the  user  is  referred  to  texts  under  headings  such  as  Dual  Algorithm  . 
Interpretation  of  the  Dual",  and  "Opportunity  Costs  . 

Basis  variables  refer  to  those  variables  which  form  the  original  "entity 
matrix.   The  variable  numbers  are  listed  in  the  order  they  were  added  to  the 
constraints.   The  number  of  basis  variables  will  always  equal  the  number  of^ 
constraints.   To  determine  whether  a  basis  variable  is  a  slack  or  artificial 
variable'refer  to  the  coefficients  of  these  variables  in  the  objective  func- 
tion.  A  slack  variable  will  have  a  coefficient  of  0.0. 

MESSAGES 

PROBLEM  TOO  LARGE:   More  than  300  variables  or  100  constraints  on  input  or 

during  addition  of  slack,  surplus,  and  artificial  variables. 

NORM  FOR  CUTOFF:   Value  of  Main  Parameter  Number  2,  either  supplied  or 

default. 
ERROR  IN  SIMPLX:   Source  Program  Error.   See  a  consultant. 
SOLUTION  UNBOUNDED:   Constraints  do  not  form  a  closed  space.   Optimal 

functional  value  is  infinite. 
NUMBER  OF  ITERATIONS:   Number  of  row  operations  needed-  to  ^alculatefinal 

solution.   For  multiple  requirement  vectors,  number  is  not  ^a£^' 

ACCURACY  ACCEPTABLE  or  ACCURACY  NOT  ACCEPTABLE :   Comparison  with  Mam  Para- 
meter Number  3- 

WjZBl¥vllABLES  ARE:   Iterations  either  inaccurate  and  new  variable  added 
or,  during  execution  of  multiple  requirement  vector,  a  new  variable  had 
to  be  added  to  make  problem  feasible  (requirement  vector  positive). 
NON-RESOLVABLE  TIE:   Cannot  occur  mathematically.   Only  reason  for  occurance 
N0N  RESOLVABLE  ^^    ^  ±n   machine .   Can  be  corrected  by  incrementing 

or  decrementing  requirement  vector  by  a  small  amount.   Perform  this 
only  for  constants  of  same  value.   (Use  Parameter  7)- 
Other  messages  should  be  self-explanatory. 

- -cial  Comments 

Speed  and  accuracy  can  be  increased  by  observing  the  following  suggestions 

1)  Never  make  Parameter  3  (accuracy  check)  larger  than  (0.1)  X  (number  of 
significant  digits  in  table).   For  example  if  numbers  in  the  tab^are 
h,    5,  .001,  86,  95.32,  you  have  "one"  significant  digit.   Set  Parameter  5 
to  ••-)•• 
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2)  Scale  numbers  in  table  to  get  them  into  same  range.   For  example,  if 
table  entries  are  of  the  order  101,  and  the  requirement  vectors  are  of 
the  order  103,  scale  requirement  vectors  to  10*  and  rescale  solution  by 
The  objective  function  may  also  be  re scaled  in  a  similar  manner. 


10 

Rescaling  essentially  reflects  the  number  of  significant  digits 

V.  Example  s 

The  problem: 

Minimize  -.75X(l)  +  150X(2)  -.02X(3)  +  6x(U) 
Subject  to  the  following  constraints: 
Constraint (l) 

.25X(1)  -60X(2)  -.0UX(3)  +  9X(U)  <  0,  1,  2 
Constraint (2) 

.05X(1)  -90X(2)  -.02X(3)  -3X00  <  0,  1,  2 
Constraint(3) 

1X(3)  <  1,  2,  3 

Could  be  set  up  on  cards  as  follows : 


END  SOUPAC 


PROGRAM 


CASH(2)*1*(1)"LE"*1**2**3*. 


■*.05**-90**-.  02**-3.  0*''LE''*0**1**2*, 
*.25*~*--60-*-*-  . Ql4-^*9*" LE"*0-**1**2* . 


rMIN*-  .  75**150*-*-  .  02**6* . 
L-P*  -3  „  E20**** .  1*  ( )  ( )  (.3. ) . 


SOUPAC. SYS IN  DD  * 


EXEC  SOUPAC 

"ID 
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This  problem  will  result  in  an  unbounded  solution  with  requirement 
vector  number  one.   The  problem  terminates  without  performing  calculations 
on  the  other  vectors. 


Note  insertion  of  sequential  zeros  on  CASH  card. 
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MATRIX 


I.   GENERAL  DESCRIPTION 

The  MATRIX  program  is  a  data  manipulating  program  for  inputting 
and  outputting,  creating,  performing  matrix  algebraic  operations,  and 
generally  handling  data  matrices.   All  the  MATRIX  suboperations  are 
restricted  to  1+50  columns  (variables) .   No  absolute  limit  is  set  on 
the  number  of  rows  (observations). 

Standard  SOUPAC  address  conventions  are  used  including  the  use  of 
the  character  X  to  denote  punched  output,  and  (F)  after  a  print  to 
denote  print  with  F  format.  Also  available  and  discussed  in  section  III 
below  is  the  use  of  I  for  storing  a  matrix  in  memory,  and  the  use  of  (L) 
after  a  print  to  invoke  the  MATRIX  labeling  feature.  All  other 
restrictions  are  noted  by  the  discussion  of  the  individual  suboperation 
explanations. 

II.   PARAMETERS 

A.  Main  Parameters 

To  invoke  the  MATRIX  program,  code  the  name  MATRIX  (or  simply 
the  program  mnemonic  MAT) .   There  are  two  optional  parameters 
available  which  may  be  coded  on  the  MAT  card. 

First,  if  it  is  desired  to  print,  immediately  prior  to  the 
execution  of  each  MATRIX  subparameter  operation,  the  time  in 
seconds  since  entry  into  the  MATRIX  program,  code  a  (l)  after  the 
name  MATRIX.   This  option  is  not  normally  needed  and  is  provided 
merely  for  giving  timing  estimates. 

The  second  optional  parameter  is  coded  as  a  (l)  following  the 
timing  estimate  parameter.   This  second  option  causes  the  number 
of  rows  and  columns  and  the  precision  (either  single  or  double) of 
the  answer  matrix  (for  all  suboperations  which  produce  an  answer 
matrix)  to  be  printed  out.   This  is  useful  in  debugging  a  MATRIX 
program  when  it  is  not  necessary  to  see  the  entire  answer  matrix  to 
be  printed  out,  but  it  would  be  helpful  to  check  the  dimensions  of 
an  answer  matrix.  As  in  all  SOUPAC  programs,  the  main  program 
parameter  card  must  be  terminated  by  a  period.   Examples: 

MATRIX. 

MATRIX  (1). 

MAT. 

MAT  (1). 

MATRIX  (1)(1). 

MAT  (  )(l).   --  recommended  usage  — 


MAT  2 


B.   Subparameters 

Any  MATRIX  operation  may  be  invoked  by  coding  its  mnemonic 
followed  by  appropriate  subparameters.  All  operations  in  MATRIX 
handle  both  single  and  double  precision  matrices  at  the  control 
of  the  user  (see  operations  SINGLE  and  DOUBLE).   For  an  address 
not  explicitly  assigned  either  single  or  double  precision,  MATRIX 
assumes  a  default  of  double  precision  for  output  to  the  address. 
Terminate  all  subparameter  cards  with  a  period. 

To  end  a  MATRIX  program,  place  a  card  which  has  the  characters 
END  P  after  the  last  MATRIX  subparameter  card.   Since  all  MATRIX 
programs  must  have  at  least  one  subparameter  operation,  an  error 
will  be  signaled  if  a  MAT  card  is  followed  immediately  by  an 
END  P  card. 

Input  and  output  for  MATRIX  may  be  from  any  source,  however  the 
following  rules  must  be  observed: 

1)  Never  use  CARDS  as  input  to  any  operation  except 
MOVE  unless  both  the  number  of  rows  and  the  number 
of  columns  have  been  specified  on  the  DATA  format 
card  at  the  front  of  the  data  deck. 

2)  You  may  never  output  to  PRINT  only.   All  MATRIX  output 
must  go  to  some  intermediate  storage  location  even  when 
only  printout  is  desired. 

3)  Avoid  using  the  same  address  more  than  once  on  the  same 
parameter  card  (unless  otherwise  noted  in  the  description 
of  an  individual  suboperation) .   In  all  suboperations 
except  INVERT,  never  specify  an  output  address  which  is 
the  same  as  an  input  address  for  that  suboperation. 

h)      The  contents  of  an  input  address  remains  unchanged 

during  the  execution  of  an  operation  unless  otherwise 
noted. 

Following  is  a  description  of  the  subparameter  operations  currently 
in  the  MATRIX  program. 

ADD  (mnemonic:   ADD) 

The  ADD  operation  has  from  three  to  twenty-one  address  parameters. 
The  last  address  is  the  output  address;  all  other  addresses  are  for 
input.  Each  input  matrix  must  have  the  same  number  of  rows  and  columns 
as  all  other  input  matrices.  An  address  may  be  used  more  than  once 
as  an  input  address. 

Corresponding  elements  of  the  first  matrix  through  the  next  to 
last  matrix  are  added  together,  and  the  result  goes  to  the  output 
addr e  s  s .   Example  s : 

ADD  (SEQ1)(SEQU)(SEQ5). 

ADD  (CEQ1)  (SEQ1+)  (SEQ2)  (SEQ3)  (SEQ5)  • 
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ALL  (mnemonic:   ALL) 

The  ALL  operation  performs  a  particular  test,  specified  by  the 
second  operand  as  a  relational  operator,  between  each  element  for 
each  input  row  and  a  floating  point  number  specified  by  the  third 
operand.   If  all  elements  of  a  row  pass  the  test,  that  row  is  out- 
put to  the  output  address. 

The  first  parameter  is  the  input  address.   The  relational 
operator  is  enclosed  in  quotation  marks.   The  third  operand  may  be 
either  a  floating  point  number  or  an  address  in  which  case  the 
first  element  of  the  matrix  is  used  as  the  floating  point  number. 
The  six  legal  relational  operators  are  "LT",  "LE",  "EQ",  "NE", 
"GT",  and  "GE" .   Examples: 

ALL  (SEQ1)  "NE"  *0.*  (SEQ2) . 
ALL  (SEQ3)  "GE"  (SEQl+)  (SEQl) . 

ANY  (mnemonic:   ANY) 

The  ANY  operation  performs  a  particular  test,  specified  by  the 
second  operand  as  a  relational  operator,  between  each  element  for 
each  input  row  and  a  floating  point  number  specified  by  the  third 
operand.   If  any  element  of  a  row  passes  the  test,  that  row  is  out- 
put to  the  output  address. 

The  first  parameter  is  the  input  address.   The  relational 
operator  is  enclosed  in  quotation  marks.   The  third  operand  may  be 
either  a  floating  point  number  or  an  address  in  which  case  the 
first  element  of  the  matrix  is  used  as  the  floating  point  number. 
The  six  legal  relational  operators  are  "LT",  "LE",  "EQ",  "NE",  "GT", 
and  "GE".  Example: 

ANY  (SEQ2)  "GT"  *3«*  (SEQl). 

COLUMN  DELETE  (mnemonic:   COL) 

The  COLUMN  DELETE  operation  specifies  which  columns  of  an  input 
matrix  are  to  be  deleted  before  sending  the  result  to  the  output 
address.   The  first  parameter  is  the  input  address.   The  second 
parameter  is  the  output  address.   The  identification  of  the  columns 
to  be  deleted  are  specified  as  positive  integers  listed  in  increasing 
order.  Example: 

COLUMN  DELETE  (SEQU)  (SEQ5) (h) (5) (6) (8) (13) (15) • 
3NSTANT  (mnemonic:   CON) 

The  CONSTANT  operation  has  three  parameters.   A  floating  point 
number  specified  by  the  second  operand  is  added  to  every  element  of 
the  matrix  specified  by  the  first  operand.   The  result  goes  to  the 
third  operand  address. 

The  second  operand  may  be  either  a  floating  point  number  enclosed 
in  asterisks  or  a  standard  SOUPAC  input  address.   If  an  address  is 
specified,  the  first  element  of  the  matrix  at  the  address  is  used  for 
the  floating  point  number.   Examples: 

CONSTANT  (SEQl)  *U.5*  (SEQ2). 
CONSTANT  (SEQ1)(SEQ3)(SEQU). 
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DIAGONAL  (mnemonic:   DIA) 

The  DIAGONAL  operation  has  two  operands,  an  input  address  and 
and  output  address.   The  main  diagonal  elements  of  the  first 
matrix  are  used  to  form  a  single  row  vector  which  is  output  to  the 
second  operand  address.  Example: 

DIAGONAL  (SEQ2)(SEQ4). 

DOUBLE  (mnemonic:   DOU) 

The  DOUBLE  operation  has  anywhere  from  one  to  twenty-one 
addresses  as  parameters.   Listing  an  address  as  a  parameter  negates 
the  effect  of  any  previous  listing  of  that  address  as  a  parameter 
in  the  operation  SINGLE.   Listing  an  address  as  a  parameter  which 
has  not  appeared  as  a  SINGLE  subparameter  has  no  effect.  Example: 

DOUBLE  (SEQ1) (SEQ2) (SEQ3) (SEQU) (SEQ5) (i). 

EJECT  (mnemonic:   EJE) 

The  EJECT  operation  causes  the  next  printout  to  begin  at  the 
top  of  a  new  page.  EJECT  has  no  parameters.  Example: 

EJECT. 
E -DIVIDE  —  Elementwise  Divide  --  (mnemonic:  E-D) 

The  E-DIVIDE  operation  has  from  three  to  twenty-one  address 
parameters.  The  last  address  is  the  output  address;  all  other 
addresses  are  for  input.  Each  input  matrix  must  have  the  same 
number  of  rows  and  columns  as  all  other  input  matrices  for  the  use 
of  the  operation.  An  address  may  be  used  more  than  once  as  an 
input  address. 

Elements  of  the  second  matrix  through  the  next  to  last  matrix  are 
divided  into  the  corresponding  elements  of  the  first  matrix.  Output 
goes  to  the  last  address.   Example: 

E-DIVIDE  (SEQ1)(SEQ2)(SEQ3). 
E -MULTI PLY  —  Elementwise  Multiply  —  (mnemonic:   E-M) 

The  E-MULTIPLY  operation  has  from  three  to  twenty-one  address 
parameters.   The  last  address  is  the  output  address;  all  other 
addresses  are  for  input.  Each  input  matrix  must  have  the  same 
number  of  rows  and  columns  as  all  other  input  matrices  for  the  use 
of  the  operation.   An  address  may  be  used  more  than  once  as  an 
input  address. 

Corresponding  elements  of  the  first  matrix  through  the  next  to 
last  matrix  are  multiplied  together.   Output  goes  to  the  last 
addr e  s  s .   Example : 

E-MULTIPLY  (SEQ3) (SEQ?0  (i) (SEQ>) . 
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E-ROOT  —  Elementwise  Square  Root  —  (mnemonic:  E-R) 

The  E-ROOT  operation  has  two  address  parameters,  an  input  address 
and  an  output  address.   The  (positive)  square  root  of  each  element 
of  the  input  matrix  is  taken  and  the  result  goes  to  the  output 
addre  s  s .  Example : 

E-ROOT  (SEQl)(SEQ2). 

EXPAND  (mnemonic:  EXP) 

The  EXPAND  operation  takes  the  first  row  of  the  first  input 
matrix  and  repeatedly  outputs  that  same  row  until  the  output  matrix 
has  the  same  number  of  rows  as  the  second  input  matrix.   Output 
goes  to  the  third  address.   Example: 

EXPAND  (SEQ1)(SEQ2)(SEQ3). 

FILE  (mnemonic:   FTL) 

The  FILE  operation  has  anywhere  from  one  to  twenty-one  addresses 
as  parameters.  FILE  is  used  to  cause  an  end-of-file  mark  to  be 
written  at  the  end  of  a  SEQUENTIAL  file.   This  operation  is  generally 
most  useful  to  the  user  who  wishes  to  place  more  than  one  file  on 
his  own  physical  tape.   Since  any  meaningful  use  of  the  FILE 
operation  requires  the  addition  of  appropriate  IBM  360  JCL  cards, 
all  but  the  most  experienced  users  should  see  a  consultant  in  the 
SOUPAC  office  before  using  this  operation.  Example: 

FILE  (SEQ5). 

GENERATE  (mnemonic:   GEN) 

The  GENERATE  operation  generates  a  single  row  vector  with  the 
floating  point  numbers  the  user  specifies.   The  first  operand  is 
the  output  address.   Remaining  parameters  are  as  many  floating 
point  numbers  as  the  users  wishes.  Example: 

GENERATE  (SEQ2)  *1.*  *2.*  *k.*   *8.*  *l6.*  *32.*. 
HORIZONTAL  AUGMENT  (mnemonic:   HOR) 

The  HORIZONTAL  AUGMENT  operation  has  from  three  to  twenty-one 
address  parameters.   The  last  address  is  the  output  address;   all 
other  addresses  are  for  input.   Each  input  matrix  must  have  the 
same  number  of  rows  as  all  other  input  matrices  for  the  use  of  the 
operation. 

Input  matrices  from  the  first  matrix  through  the  next  to  last 
matrix  are  stacked  left  to  right  and  the  result  goes  to  the  last 
addre  s  s .  Example : 

HORIZONTAL  AUGMENT  (SEQl) (SEQ»  (SEQ2) (i) . 
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IDENTITY  (mnemonic:   IDE) 

The  IDENTITY  operation  has  two  parameters.   An  identity  matrix, 
of  order  specified  by  a  fixed  point  number  as  the  first  operand,  is 
output  to  the  address  specified  by  the  second  operand.   Example: 

IDENTITY  (20)(SEQ3). 
INPUT  (mnemonic:   INP) 

The  INPUT  operation  will  input  formatted  records  from  any 
available  device.   This  option  is  primarily  for  reading  card 
images  or  other  similar  data  the  user  may  have  usually  on  his  own 
tape,  which  would  be  awkward  to  input  in  the  typical  card  deck 
manner. 

Never  input  to  I  (see  SPECIAL  COMMENTS)  using  the  INPUT  operation 
unless  both  the  number  of  rows  and  the  number  of  columns  of  the 
input  matrix  are  specified  as  parameters  on  the  INPUT  operation 
parameter  card. 

The  parameters  for  INPUT  are  the  input  address,  the  output 
address,  the  number  of  rows  of  the  input  matrix  (optional  in 
most  cases),  number  of  columns,  and  the  format  enclosed  in  quotation 
marks.   Example: 

INPUT  (SEQ1)(SEQ2/PRINT)(20)(5)  "(10F8.3)". 
INVERT  (mnemonic:   INV) 

The  INVERT  operation  inverts  a  non-singular  real  matrix. 
The  INVERT  operation  has  five  subparameters,  the  last  three  of 
which  are  optional.   The  first  parameter  is  the  address  of  the  matrix 
to  be  inverted,  and  the  second  parameter  is  the  output  address  of 
the  result.   (The  incore  address  option  described  in  section  III. A  - 
SPECIAL  COMMENTS  -  may  be  used  for  either  input,  output,  or  both) . 
To  have  the  determinant  of  the  original  matrix  printed  out,  code  a 
(l)  as  the  third  parameter. 

The  inversion  technique  used  is  the  Gauss-Jordan  method  with 
pivot  elements  assumed  to  be  on  the  main  diagonal.   If  it  is 
desired  that  the  inversion  technique  perform  row  and  column  inter- 
change, for  the  purpose  of  picking  pivot  elements  as  those  with  the 
largest  absolute  value  at  each  step  of  the  elimination  procedure, 
code  a  (l)  as  the  fourth  parameter.   The  default  case,  pivot  elements 
assumed  to  be  on  the  main  diagonal,  executes  faster  than  when  row 
and  column  interchange  is  performed.  For  those  real  symmetric  matrices 
which  have  the  property  that  the  largest  elements  are  necessarily  on 
the  main  diagonal  (e.g.  correlation,  cross-products,  variance- 
covariance  matrices)  numerical  accuracy  of  the  results  is  not 
significantly  different  between  the  two  options.   For  general  matrices 
in  which  specific  properties  are  not  known,  using  row  and  column 
interchange  will  probably  produce  more  accurate  results. 

The  fifth  argument  is  a  floating  point  number  enclosed  in  asterisks 
which  is  to  be  used  as  the  criterion  for  singularity.   If  the  absolute 
value  of  any  pivot  element  is  less  than  the  criterion  for  singularity, 
the  matrix  is  assumed  to  be  singular.   If  no  value  is  specified,  or 
if  *0.*  is  specified  as  the  fifth  parameter,  a  default  value  of  10-8 
is  used  to  test  for  singularity. 
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INVERT  destroys  any  previous  use  of  the  incore  address  option. 
All  calculations  are  done  in  double  precision. 

INVERT  also  has  the  ability  to  solve  a  set  of  simultaneous  linear 
equations  if  a  unique  solution  exists.   To  solve  the  system  indicated 
by  the  matrix  equation 

AX  =  Y 

input  to  the  INVERT  suboperation  a  matrix  which  contains  AY  (i.e. 
the  constant  term  appearing  as  the  last  column  variables).   The 
resulting  output  of  the  INVERT  suboperation  will  be 

A_1X. 

The  Y  above  may  be  more  than  one  column  vector  in  which  case  each 
resulting  column  vector  of  X  will  be  the  solution  for  the  corresponding 
column  of  Y.  Examples: 

INVERT  (SEQ1)(SEQ2). 

INVERT  (SEQU) (SEQ3) (!)(!)• 

INVERT  (SEQ2)(SEQU)(1)(  )  *10.E-5*. 

INVERT  (SEQ5)(SEQ1)(  ) (l)  *. 0000001*. 

LABEL  (mnemonic :   LAB) 

The  LABEL  operation  is  used  to  store  a  title  and  column  labels  at 
a  SOUPAC  address  for  later  use  within  a  MATRIX  program.   The  title 
is  limited  to  128  characters.   Labels  are  limited  to  eight  characters 
each. 

The  first  parameter  is  the  address  where  the  title  and  labels 
are  to  be  stored.   This  is  then  followed  by  the  title  and  labels 
each  enclosed  in  quotation  marks.   Only  one  label  set  is  active  at 
any  one  time.  Hence,  each  use  of  LABEL  overrides  all  previous  uses. 
Labels  generated  within  a  MATRIX  program  may  not  be  passed  to  other 
programs.   (Note:  The  incore  address  option  may  be  used  to  store  a 
title  and  label  set  if  desired.)  Examples: 

LABEL  (SEQU)  "TITLE"  "LABEL  l"  "LABEL  2"  "LABEL  3". 
LABEL  (SEQl)  "ONLY  A  TITLE;  NO  LABELS". 

MOVE  (mnemonic:  MOV) 

The  MOVE  operation  moves  (actually  copies)  a  matrix  from  one 
SOUPAC  standard  input  source  to  another.   If  reading  from  SEQUENTIAL, 
the  MOVE  operation  assumes  that  the  data  set  was  created  using  SOUPAC 
conventions,  i.e.  by  some  SOUPAC  program.   If  the  input  source  is 
CARDS,  the  input  deck  must  be  preceded  by  a  correct  DATA  format 
statement  and  terminated  by  an  END#  card. 

Never  MOVE  from  CARDS  to  I  (see  SPECIAL  COMMENTS)  unless  both 
number  of  rows  and  number  of  columns  of  the  input  matrix  are 
specified  at  the  front  of  the  data  deck. 

The  operation  has  between  two  and  twenty-one  addresses  as  parameters, 
The  first  address  is  the  input  address.  All  remaining  addresses  are 
output  addresses.   Examples: 

MOVE  (CARDS)(SEQ1)(SEQ2). 
MOVE  (CARDS) ( SEQl). 
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MULTIPLY  (mnemonic:   MUL) 

The  MULTIPLY  operation  has  three  addresses  for  parameters.  A 
matrix  multiplication  is  performed  between  the  matrices  on  the  first 
two  addresses  and  the  result  is  stored  in  the  third  address.   The 
MULTIPLY  operation  permits  use  of  the  same  address  to  be  used  as  an 
input  address  for  both  first  and  second  operands.   This  usage  is 
equivalent  to  using  the  SQUARE  subope ration. 

The  incore  address  option  may  not  be  used  for  input  of  the  first 
operand  if  the  first  operand  is  different  from  the  second.   The 
incore  address  option  may  never  be  used  for  output  of  the  result. 
The  MULTIPLY  operation  destroys  any  matrix  which  has  been  stored 
in  core  using  the  incore  address  option  (see  section  III. A  -  SPECIAL 
COMMENTS).  All  calculations  are  done  in  double  precision.  Examples: 

MULTIPLY  (SEQ1)(SEQ2)(SEQ3). 
MULTIPLY  (SEQ1)(SEQ1)(SEQ2). 
MULTIPLY  (SEQ2)(l)(SEQ5). 

OUTPUT  (mnemonic:   OUT) 

The  OUTPUT  operation  outputs  a  matrix,  a  row  at  a  time  under 
format  control,  to  a  user  specified  data  set.   The  output  address 
specified  should  not  be  used  anywhere  else  in  the  current  SOUPAC 
job  step  except  with  options  which  also  perform  formatted  i/O 
(e.g.  the  INPUT  and  OUTPUT  operations  of  MATRIX).   The  syntax 
of  OUTPUT  is  two  addresses,  input  and  output  addresses  respectively, 
followed  by  the  desired  format  enclosed  by  quotation  marks.  Example: 

OUTPUT  (SEQ1)(SEQ5)  " (20E15.7)"- 

PARTITION  (mnemonic:   PAR) 

The  PARTITION  operation  is  used  to  select  a  sub-matrix  of  an 
original  input  matrix.   The  first  operand  is  the  input  address  and 
the  second  operand  is  the  output  address.   The  next  four  parameters 
specify  in  order,  the  beginning  column  of  the  partition,  the  ending 
column  of  the  partition,  the  beginning  row  of  the  partition,  and  the 
ending  row  of  the  partition.   If  either  beginning  parameter  is  left 
out,  the  partition  begins  with  the  first  row  (or  column).   If  either 
ending  parameter  is  left  out,  the  partition  ends  with  the  last  row 
(column).   Examples: 

PARTITION  (SEQ5) (SEQ2) (5) (6) (2) (50) . 
PARTITION  (SEQl+)(SEQ2)(3)(i+0). 

PERMUTATION  (mnemonic:   PER) 

The  PERMUTATION  operation  outputs  a  square  matrix  to  the  address 
of  the  first  operand.  Additional  operands  are  positive  fixed  point 
integers.   The  number  of  rows  (and  columns)  in  the  output  matrix  is 
the  number  of  fixed  point  numbers  used.  Each  element  of  each  row 
is  a  zero  except  for  the  column  specified  as  an  integer  in  the  para- 
meter list.   The  first  integer  indicates  the  column  in  which  to 
place  a  one  in  the  first  row,  the  last  integer  in  the  list  specifies 

which   column  the  one  is  to  be  placed  in  the  last  row.  All  other 
rows  are  specified  in  a  like  manner.  Example; 

PERM      I  (SEQ3)(5)00(3)(2)(l). 
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PRINT  (mnemonic:  PRI) 

The  PRINT  operation  prints  out  a  matrix,  one  row  at  a  time,  under 
the  control  of  a  user  supplied  format.  Formats  follow  FORTRAN  IV 
conventions  with  the  added  restriction  that  formats  are  limited  to 
592  characters. 

The  first  parameter  is  the  address  of  the  matrix  to  be  printed. 
The  second  parameter  is  the  format  enclosed  in  quotation  marks. 
(Warning:  Allow  for  carriage  control  as  the  first  character  in 
output  lines.  A  print  line  has  133  characters.)  Example: 

PRINT  (SEQ2)  "('  ',3F20.10)". 
PUNCH  (mnemonic:   PUN) 

The  PUNCH  operation  has  the  same  syntax  as  PRINT  and  is  used  to 
punch  out  a  matrix  under  the  control  of  a  user  supplied  format. 
(Warning:  When  punching  cards,  remember  that  there  is  room  for  only 
80  characters  per  card) . 

The  PUNCH  operation  always  punches  two  cards  in  addition  to  the 
actual  data  deck.   At  the  front  of  the  data  is  punched  a  DATA  format 
card,  and  at  the  end  of  the  data  is  punched  an  END#  card.  Example: 

PUNCH  (SEQl)"(8F10.2)". 

RECIPROCAL  (mnemonic:   REC) 

The  RECIPROCAL  operation  has  two  operands,  an  input  address  and 
an  output  address.   The  reciprocal  of  the  square  root  of  the  main 
diagonal  elements  from  the  first  matrix  are  used  to  form  a  single 
row  vector  which  is  output  to  the  second  operand  address.  Example: 

RECIPROCAL  (SEQ1)(SEQ3). 

REWIND  (mnemonic:  REW) 

The  REWIND  operation  has  anywhere  from  one  to  twenty-one  addresses 
as  parameters.  REWIND  is  used  to  rewind  a  sequential  file.   The 
REWIND  operation  needs  only  to  be  used  with  the  INPUT  operation  when 
it  is  desired  to  reread  a  formatted  input  file.  Example  (to  input 
the  same  formatted  file  from  SEQ  3  onto  both  SEQ,  1  and  SEQ,  2  under 
control  of  different  formats) : 

INPUT  (SEQ3)(SE0a)(  )(5)  " (lOX, 5F10.0)". 

REWIND  (SEQ3). 

INPUT  (SEQ3)(SEQ2)(  )(8)  "(8F10.0)". 

ROW  DELETE  (mnemonic:   ROW) 

The  ROW  DELETE  operation  specifies  which  rows  of  an  input  matrix 
are  to  be  deleted  before  sending  the  result  to  the  output  address. 
The  first  parameter  is  the  input  address.   The  second  parameter  is 
the  output  address.   Row  numbers  must  be  positive  integers  listed  in 
increasing  order.  Example: 

ROW  DELETE  (i) (SEQ3) (l) (3) (7) (8) (ll)  (U5) . 
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SCALAR  (mnemonic :   SCA) 

The  SCALAR  operation  has  three  parameters.  A  floating  point 
number  specified  by  the  second  operand  is  multiplied  by  every  element 
of  the  matrix  specified  by  the  first  operand.   The  result  goes  to 
the  third  operand  address. 

The  second  operand  may  be  either  a  floating  point  number  enclosed 
in  asterisks  or  a  standard  SOUPAC  input  address.   If  an  address  is 
specified,  the  first  element  of  the  matrix  at  the  address  is  used 
for  the  floating  point  number.  Examples: 

SCALAR  (SEQl)*2.*  (SEQ2). 
SCALAR  (SEQU)(SEQ3)(SEQ2). 

SINGLE  (mnemonic:   SIN) 

The  SINGLE  operation  has  anywhere  from  one  to  twenty-one  addresses 
as  parameters.  Listing  an  address  as  a  parameter  caus --y  matrices 
written  on  that  address  to  be  written  in  single  precision.   MATRIX 
"ores  all  data  matrices  in  double  precision  unless  the  -er  specifies 
otherwise  with  the  suboperation  SINGLE.   The  listing  of  -  f  *-ss  in 
a  SINGLE  statement  in  one  MATRIX  program  does  not  carry  over  m  effect 
to  any  other  MATRIX  program.   Example: 
SINGLE  (SEQ1)(SEQ2)(SEQ». 

SUBTRACT  (mnemonic:   SUB) 

The  SUBTRACT  operation  has  from  three  to  twenty-one  address 
parameters   The  last  address  is  the  output  address;  all  other  addresses 
areTor  Input   Each  input  matrix  must  have  the  same  number  of  rows 
and  columns  as  all  othe?  input  matrices  for  the  use  of  the  operation. 

Eleme^s  of  the  second  matrix  thru  the  next  to  last  matrix  are 
subbed  from  corresponding  elements  of  the  first  matrix   Output 
goes  to  the  last  address.   An  address  may  be  used  more  than  once  as 
an  input  address.   Example: 

SUBTRACT  (SEQ1)(SEQ3)(SEQ>). 

SUBTRACT  ( SEQU ) ( SEQ2 ) ( SEQ3 ) ( SEQ1 ) ( SEQ5 ) • 

SQUARE  (mnemonic:   SQU) 

The  SQUARE  operation  performs  a  matrix  multiplication  of  a  square 
matrix  ties  itself.   The  input  matrix  is  specified  by  thirst 
address  parameter.   Output  goes  to  the  se  cond  ad  dress   T^  ^ore 
address  option  may  not  be  used  as  an  output  address   SQUARE  destroys 
any  previous  use  of  the  incore  address  storage.   All  calculations 
are  done  in  double  precision.   Example: 
SQUARE  (SEQ2)(SEQ1). 

TRANSPOSE  (mnemonic:   TRA) 

The  TRANSPOSE  operation  transposes  a  matrix  (interchanges  rows 
and  columns).   TRANSPOSE  destroys  any  P™sus^  of  the  tne 
address  storage.   The  two  parameters  for  TRANSPOSE  ar first  tne 
input  address  and  second  the  output  address  of  the  result.  Example. 

TRANSPOSE  (SEQ1)(SEQ2). 
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VECTOR  (mnemonic:  VEC) 

The  VECTOR  operation  has  two  operands,  an  input  address  and  an 
output  address.  A  single  vector  from  the  first  location  is  used 
to  form  a  diagonal  matrix  which  is  output  to  the  second  address. 
If  the  input  matrix  has  more  rows  than  columns,  the  first  column 
vector  is  used  to  form  the  diagonal  matrix.  If  the  input  matrix 
has  more  columns  than  rows,  the  first  row  is  used  to  form  the 
diagonal  matrix.  Example: 

VECTOR  (SEQ1)(SEQ3). 
VERTICAL  AUGMENT  (mnemonic:   VER) 

The  VERTICAL  AUGMENT  operation  has  from  three  to  twenty-one 
address  parameters.   The  last  address  is  the  output  address;  all 
other  addresses  are  for  input.   Each  input  matrix  must  have  the 
same  number  of  columns  as  all  other  input  matrices  for  the  use  of 
the  operation. 

Input  matrices  from  the  first  address  through  the  next  to  the  last 
address  are  stacked  top  to  bottom  and  the  result  goes  to  the  last 
address.  All  input  matrices  must  have  the  same  number  of  columns. 
Example : 

VERTICAL  AUGMENT  (SEQl) (SEQ2) (SEQ3) . 

[II.   SPECIAL  COMMENTS 

A.   Incore  Address  Option 

Besides  the  standard  SOUPAC  addresses,  MATRIX  also  recognizes 
the  additional  address  I.  The  I  symbol  as  an  address  represents 
internal  storage  in  the  machine. 

An  obvious  use  of  this  feature  is  to  cut  down  on  I/O  time  for 
matrices  which  are  to  be  used  in  future  operations  within  the 
current  matrix  program.   The  internal  storage  feature  also  saves 
time  when  the  user  desires  his  output  from  an  operation  to  be 
printed  or  punched.   The  user  must  keep  in  mind  that  data  cannot 
be  passed  to  subsequent  programs  with  the  I  storage.  The  user 
should  also  be  aware  of  the  restrictions  on  I  storage  as  mentioned 
above  in  some  of  the  subparameter  operations  (see  INVERT,  MULTIPLY, 
SQUARE,  and  TRANSPOSE).   In  all  cases  the  use  of  this  option  is  not 
recommended  for  matrices  which  do  not  fit  within  the  memory  available 
to  the  MATRIX  program  while  running  within  any  particular  region  size, 

1)  To  add  the  matrix  on  SEQl  to  the  matrix  on  SEQ2  leaving 
the  result  in  core  and  also  printing  the  result,  code  as 
follows : 

ADD  (SEQ1)(SEQ2)(I/PRINT). 

2)  To  vertically  augment  the  matrices  in  core,  on  SEQl  and  on 
SEQ2,  storing  the  result  on  SEQ^,  code  as  follows: 

VERTICAL  AUGMENT  (i) (SEQl) (SEQ2) (SEQU) . 
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B.   Labeled  Output 

Provided  in  the  MATRIX  program  is  the  facility  to  title  and  put 
column  labels  on  any  matrix  which  is  printed  using  normal  SOUPAC 
print  conventions.   The  labeling  feature  is  not  allowed  with  the 
PRINT  matrix  operation. 

To  use  the  labeling  feature,  it  is  first  necessary  to  put  the 
title  and  labels  in  a  temporary  storage  area.   This  is  accomplished 
with  the  LABEL  operation  (see  Subparameters) . 

To  use  a  label  which  has  been  placed  in  a  temporary  storage  area, 
code  (L)  after  the  print  portion  of  the  output  address.   If  F 
format  is  also  desired  code  either  (F,L)  or  (L,F)  after  the  print. 
Example  s : 

1)  To  move  (copy)  the  matrix  on  SEQ1  onto  SEQ5  printing  the 
result  in  F  format  with  title  and  column  labels,  code  as 
follows : 

MOVE  (SEQ1) (SEQ5/PRINT(F,L) ) . 

2)  To  add  the  matrices  on  SEQ1  and  SEQ2  storing  the  result  on 
SEQ3  and  also  printing  the  result  with  title  and  column 
labels,  code  as  follows: 

ADD  (SEQ1)(SEQ2)(SEQ3/PRINT(L)). 

3)  To  transpose  the  matrix  stored  on  SEQ1  to  SEQ2,  printing 
out  the  result  in  F  format  with  title  and  column  labels, 
and  punching  out  a  card  deck  of  the  transposed  matrix, 
code  as  follows : 

TRANSPOSE  (SEQ1) (SEQ2/PRINT(L,F)/X) . 


June  29,    1970 
SOUPAC 


MISSING  DATA  CORRELATION 


I.   General  Description 

The  MISSING  DATA  CORRELATION  program  calculates  the  following  coeffi- 
cients for  every  combination  of  variables: 

ZNi 


Mean:  X±   = 


W1J*XV  -  (giJ)2,l/2 


Standard  Deviation:   s  .  .  =  [-=« - — ] 

n,  z(xYij)  -  (g13)(a13) 

Covanance  :   S.  .  = — 

s.. 

Correlation:   r 


U  ■  Sx. .sy.. 


II.   Restrictions 


The  maximum  number  of  variables  for  this  program  is  100. 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC.  Output  may  be  printed  and  the  correlation  matrix  may  be  placed  on 
any  source  conforming  to  SOUPAC. 


III.   Parameters 


The  parameters  for  the  MISSING  DATA  CORRELATION  program  appear  on  the 
program  card.   They  must  follow  the  program  name  in  the  following  order: 

Parameter 
[■lumber  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
Default  is  CARDS. 

2  0  -  printing  as  usual 

1  -  printing  is  suppressed 

Output  Address  of  correlation  matrix. 

U  Output  Address  for  sample  sizes. 

5  Coding  for  missing  data;  if  left  blank  or  if 

zero  is  entered,  minus  zero  is  used  as  check. 
It  is  NOT  possible  for  this  program  to  count 
true  zeroes  as  missing  data.   This  parameter 
must  be  enclosed  in  asterisks.  Example:   *99*« 

NOTE:  All  output  is  in  double  precision. 
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IV.   Special  Comments 

A.  The  user  is  warned  against  further  processing  of  the  correlations 
output  by  this  program  because  the  correlations  do  not  necessarily 
come  from  the  same  sample . 

B.  For  control  breaks,  data  must  be  presorted  on  the  control  variables 
with  the  last  .variable  changing  fastest.   The  maximum  number  of  control 
variables  is  20.   Control  variables  begin  on  a  new  card  with  $  in  column 
1  and  are  enclosed  in  parentheses. 

C  .  The  correlation  matrices  can  be  stored  in  parameter  3  is  a  temporary 
storage  address.   However,  if  control  breaks  are  also  being  used, 
only  the  first  matrix  corresponding  to  the  first  control  break  can  be 
saved . 


October  22,  1969 

SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 


MULTIPLE  CORRELATION 

I.    GENERAL  DESCRIPTION 

The  MULTIPLE  CORRELATION  program  calculates  the  following  coefficients 
where  N  =  sample  size 

P  =  number  of  independent  variables 
S_  =  the  vector  of  standard  deviations 
X  =  independent  variable  mean 

Y  =  dependent  variable  mean 

Mean :       . i  X- 
_   1=1  iJ 

Xj  -■ 

RAW  DATA  CROSS-PRODUCTS:   X'X  =  Z(X.X.) 

Covariance:         rx      (»)(») 
S.  .  = 


iJ  "  (N-l)      N(N-l) 

1/2 
Standard  Deviation:   s.  =  (S. . ) 


11 


S.  . 
Product  Moment  Correlation:   r..  =  — "- 


IJ    s.s. 
The  correlation  matrix  is  then  partitioned  as  follows 


B' 


C 


where  A  is  the  independent  variables  correlation  matrix.   C  is  the 
dependent  variables  correlation  matrix.   And  B  and  B'  are  the  cross- 
correlation  matrix. 

Standardized  Regression  Coefficients:   $  =  A  B 

N-l 
eviation  Covariance  Matrix:    D  =  S(C-B*  8)  S' 


[N-P-l] 
This  leaves  the  following  matrix: 


A"1 
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1/2 
Deviation  (Partial)  Correlation  Matrix:  D..  =  D.  ./(D.  .J)    .  )  ' 
- < lj    ij/v  11  jj' 

Regression  Covariance  Matrix:   RC  =  SB'  BS'[— — ] 


Standard  Error  of  Standardized  Regression  Coefficients: 

s    =  [(c-B'e)..ATi/  (n-p)]1/2 

PJ  •!  11  J  J 

1  /? 

Multiple  Correlation:   R.  =  (B'B). 
j        j 

Standard  Error  of  Estimate:    S  .  =  S.[l-(Rj2  — TT  ^"J~ J1'2 

ej    J     j     I'i-F-1 

s  . 

Unstandardized  Regression  Coefficient:   b.  .  =  — "-  3.  . 
a j.i   Bi  J.i 


Standard  Error  of  Under standardized  Regression  Coefficient: 

s  . 

s,  .  .  =  -J-  s0.  . 
bj.i   s.   6 j.i 

1  b.  . 

T  =  Regression  Coefficient/Standard  Error:   T .  .  =  — ^ 

J  •  i   s,  . 

bj.i 


Dependent  Variable  Intercept:   A.  =  Y.  -  Z  b.  .X. 
— e *—         J    J   i=1  J.i  i 

The  inverse  of  the  independent  variables  cross-products  matrix  is 
also  printed. 


Predicted  Dependent  Variables:  Y.*  =  a.  +  Z  b.  .X. 
for  each  row  of  data  J     J   i=l  J ' 1 

Deviation  From  Actual:   Dev  =  Y.  -Y.* 
J    J 

For  reference  to  formulas  and  interpretations  see: 
E.  C.  Bryant,  Statistical  Analysis,  New  York,  McGraw-Hill,  i960,  pp.  198-224, 

II.   RESTRICTIONS 

U50  variables,  430  Dependent  and  Independent  variables.  (See  Parameter  2) 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
FPAC.   Output  will  be  printed  and/or  stored  as  indicated. 
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III.   PARAMETERS 


The  parameters  for  the  MULTIPLE  CORRELATION  program  appear  on  the 
program  call  card.   They  must  follow  the  program  name  in  this  order: 


Parameter 


Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15, 

(See  Special  Comments  for  order  of  variables) 

2  Number  of  independent  variables. 

Ind.  var.  +  dep.  var.  +  wt .  var.  +  control  var.  <_  450, 
Ind.  var.  +  dep.  var.  <_  ^30. 

3  Output  Address  of  predicted  dependent  variables. 
SEQUENTIAL  1-15  and/or  PRINT. 

h  Output  Address  of  deviations  from  actual. 

SEQUENTIAL  1-15  and/ or  PRINT. 

5  Output  Address  of  Means  and  Standard  Deviations. 
SEQUENTIAL  1-15  and/or  PRINT. 

1st  column  contains  Means 

2nd  column  contains  Standard  Deviation. 

3rd  column  contains  Sample  Size.l 

6  Output  Address  of  coefficients.   SEQUENTIAL  1-15, 
PRINT  is  default.2  Coefficients  for  M  independent 
and  N  dependent  variables  are  written  as  N  rows 
with  N  +  M  +  1  columns  each.   The  it*1  row  contains 
in  order  the  i^n  intercept  term,  the  M  coefficients 
for  the  i"th  dependent  variable ,  a  -1  in  the 

M  +  i  +  1  location,  and  0's  for  all  other  locations. 
This  format  is  compatable  with  the  ECONOMETRICS 
REDUCED  FORM  AND  RESIDUAL  ANALYSIS  program. 

7  Output  Address  of  correlation  matrix.   SEQUENTIAL 

1-15  and/or  PRINT. 

8  Output  Address  of  raw  data  cross-products  matrix. 

SEQUENTIAL  1-15  and /or  PRINT. 

9  Output  Address  for  covariance  matrix. 

SEQUENTIAL  1-15  and/or  PRINT. 

10  Output  Address  of  deviation  covariance  matrix 

SEQUENTIAL  1-15  and/or  PRINT. 

11  Output  Address  of  deviation  (partial)  correlation 

matrix.   SEQUENTIAL  1-15  and/or  PRINT. 
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Parameter 
Number 


12 


13 


Ik 


15 


16 


Use  or  Meaning 

Output  Address  for  regression  covariance  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 

Output  Address  of  Durbin-Watson  and  second,  third 

and  fourth  powers  of  sums  of  deviations. 

SEQUENTIAL  1-15  and/or  PRINT. 

Row  1  Durbin-Watson  Coefficients. 

Row  2   Z(y-y*)2 

Row  3   £(y-y*)3 

Row  k     £(y-y*r 

Output  Address  of  inverse  of  augmented  independent 
variables  cross-products  matrix. 3  SEQUENTIAL  1-15 
and/or  PRINT. 

If  weighting  factors  are  desired  to  make  this 
parameter  a  1.   The  weights  must  be  in  each 
row  and  must  be  to  the  right  of  the  dependent 
variables.   Leave  this  parameter  blank  or  zero  if 
option  is  not  wanted. 

Tolerance  used  to  determine  if  correlation  matrix 
is  singular.   If  this  parameter  is  left  blank, 
a  tolerance  of  10" 5  will  be  used.   If  any  other 
tolerance  is  desired,  it  should  be  punched  as 
follows:   *_-E1_*  where  the  blanks  could  be 
filled  in  as  follows:   *13-5E-10*.   This 
parameter  must  be  enclosed  in  asterisks  as 
shown  in  the  examples. 


IV.  SPECIAL  COMMENTS 

Twenty-four  control  variables  will  be  allowed  and  are  specified  by 
normal  conventions  but  these  variables  must  be  to  the  right  of  the  dependent 
variables  and  weights.   Control  variables  will  not  be  used  in  the  calculations. 
If  control  breaks  are  used  only  the  first  set  of  output  can  be  stored  on 
Sequential  address.   (Control  variables  must  be  pre-sorted  either  in 
S0UPS0RT  or  by  Machine). 

Independent  variables  must  be  on  the  left,  then  dependent  variables, 
weights  (if  any),  and  control  variables  (if  any). 

If  an  independent  variable  or  the  only  dependent  variable  is  constant 
a  message  will  be  printed  and  the  sample  will  be  discarded  after  computing 
the  correlation  matrix. 

V.  EXAMPLES 


/•ID 

//   EXEC  SOUP 
I   DD  * 
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(Example  1,  Continued) 

MUL  (C)(5)()()(P)()(P) 

ENDS 

DATA(T)  (7F1.0) 


END# 
/* 


This  program  reads  from  cards,  uses  5  independent  variables  and  2 
dependent  variables.   Means,  Standard  Deviations,  Correlations,  and  default 
options  are  printed. 


/*ID 

//  EXEC  SOUP 

//SYSIN  DD  * 

TRA(C)(S1)(23) 

PER(l) (1,15) (20, 22) (16, IT) (19) (18) (23). 

ENDP 

MUL(Sl)(l8)()(P)()()()(P)()()()()(P)(P)(l)*1.0E-8*. 

$  (22)  (23). 

ENDS 

DATA  (23)(23F2.0) 


END# 
/* 

This  program  reads  from  Sequential  1,  uses  18  independent  variables, 
2  dependent  variables,  weights,  and  control  variables.   Output  are  deviations, 
cross  products,  Durbin-Watson  Statistics,  sums  of  the  second,  third,  and 
fourth  powers  of  deviations,  inverse  of  augmented  independent  variable 
cross  products  matrix  and  default  options  are  printed. 

VI .   FOOTNOTES 

If  weights  (parameter  15)  are  used,  the  weighted  sample  size  will  be 
output . 


MULTIPLE  CORRELATION 
Page  6 

(Footnotes,  Continued) 

2 
Sample  Size,  Regression  Coefficients,  Standard  Errors,  Multiple 

correlations,  T,  and  Dependent  Variables  Intercept  are  printed  by  default. 

3 
The  dependent  variable  portion  of  the  raw  data  cross-products  matrix 

is  deleted  leaving  the  independent  variable  portion.  EX.   EX.  and  N  is 

then  augmented.  '  J 


X'X 

EX. 

l 

EX. 

N 

J 

OBLIMAX  ROTATION 


I .   General  Description 

The  OBLIMAX  OBLIQUE  ROTATION  transforms  a  set  of  factors  F  to  a  new 
set  V  such  that  the  factor  kurtosis, 

LZ  v.  .                                            ,   _   _ 
K  _     iJ  i  =  1,  2, ,  n 

"  (LEV./  J  =  1>  2> >    k 

is  at  a  maximum. 

The  purpose  of  the  transformation  is  to  attempt  to  rotate  analytically 
to  a  position  similar  to  that  obtained  by  applying  Thurstone's  rules  for 
simple  structure.   (See  Multiple  Factor  Analysis,  L.  L.  Thur stone,  19^+7  > 
pp.  319-^10.)  However,  Thurstone's  rules  and  the  oblimax  procedure  are 
not  the  same,  and  it  is  too  much  to  expect  that  results  obtained  from 
both  procedures  will  agree  exactly. 

It  would  be  desirable  to  solve  directly  for  the  transformation  matrix 
T,  but  unfortunately  no  solution  to  this  problem  has  been  found.   Instead 
oblimax  takes  two  vectors  at  a  time,  solves  for  the  rotational  angles, 
transforms  the  vectors,  and  then  selects  another  pair  until  all  k(k-l) 
pairs  have  been  rotated.   This  process  is  repeated  iteratively  until  the 
criterion  K  no  longer  increases.   Despite  the  pairwise  procedure,  K  is 
well  behaved,  and  in  general,  approaches  steadily  to  a  minimum. 

For  any  pair  of  factors,  a  and  b,  the  solution  proceeds  as  follows: 

£l(a.  cos  0.  +  b.  sin  0.)        ZZ(a.  +  b.  X.) 
K  =     x      J    1      J       =     x    x  3 

[iz(a.  cos  0.  +  b.  sin  0.)2]2    [zz(a.  +  b.  X.)2]2 

The  derivative  of  K^  is  set  equal  to  zero,  resulting  in  a  quartic 
equation  in  X  which  is  tan  0.   Two  solutions  for  X  will  be  maxima,  and, 
each  X  is  found,  the  sign  of  the  second  derivation  is  inspected  to  select 
maxima.  A  small  transform  (2  x  2)  is  created,  but  before  post-multiplication 
is  performed,  the  transforms  must  be  adjusted  so  that  when  it  becomes  a  part 
of  T,  t  ,  and  t^,  will  remain  normalized.   In  this  way,  both  B  and  T  are 
developed  pair  by  pair. 

For  references  see: 

Pinzka,  C,  and  Saunders,  D.  R.,  "Analytic  Rotation  to 
Simple  Structure  II:   Extension  to  an  Oblique  Solution." 
Research  Bulletin  RB-5U-31.   Princeton,  N.  J.:  Educational 
Testing  Service,  195^- 

II.  Alternate  Use 

If  the  user  already  has  a  transformation  matrix,  he  may  use  it  to  com- 
pute Vrs  et.al  by  giving  the  input  address  of  T  in  parameter  8;  in  this  case, 
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the  oblimax  calculation  of  T  will  be  skipped.   If  both  F  and  T  are  to  be 
input  from  cards,  then  the  data  deck  of  F  should  precede  the  deck  of  T. 


III.   Output 

The  OBIIMAX  program  always  prints  the  following  (unless  parameter  8 
is  used) : 

1.  The  value  of  K  for  each  pass 

2.  The  iteration  time 

It  outputs  the  following  on  demand  (See  Parameters) : 

1.  Transformation  matrix  T 

2.  Reference  vector  structure,  Vrs  =  FT 

3-   Reference  vector  correlations,  Crs  =  T'T 

k.      Diagonal  of  D  and  of  D"1 

where  D  is  the  diagonal  matrix  of  the  reciprocal 
square  root  of  the  diagonal  elements  of  Crs~1 

5.  Primary  factor  pattern,  Vfp  =  FTD"1  =  Vr£,D~ 

6.  Primary  factor  correlations,  C^-p  =  DC   ~-4) 
All  data  is  printed  out  to  seven  decimal  places. 

IV.   Restrictions 

The  number  of  variables  plus  the  number  of  factors  must  be  no  more 
than  300. 

V.   Note  on  Parameter  2 

If  row  normalization  is  specified,  the  normalization  constants  will  be 
preserved  and  the  rows  will  be  re scaled  to  proper  length  after  rotation  and 
prior  to  output. 

VI.   Parameters 

The  program  name,  OBIIMAX,  appears  first  on  the  program  call  card 
and  is  followed  by  the  following  parameters.   Any  output  option  (except 
parameter  9)  may  be  SEQUENTIAL  1-15  and/or  PRINT;  it  may  be  left  blank 
if  not  desired. 

Parameter 
J umbe r  Use  or  Meaning 

1  Input  Address  of  F.   CARDS  or  SEQUENTIAL  1-15- 
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Parameter 


3 
k 

5 
6 

7 
8 

9 


Use  or  Meaning 

If  rows  are  to  be  normalized  before  rotation, 
punch  a  1;  otherwise  a  zero  or  leave  blank. 
(See  Note  on  Parameter  2) . 

Output  Address  of  T. 

Output  Address  of  V 


rs 


Output  Address  of  C 

rs 

Output  Address  of  Vfp 

Output  Address  of  C^ 

Input  Address  of  T  (See  Alternate  Use) . 

D-value  and  Inverse  of  D.   PRINT  only. 
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PARTIAL  CORRELATION 


I .   General  Description 

This  routine,  upon  option,  provides  two  of  the  more  common  types  of 
special  purpose  correlation  coefficients. 

A.   Partial  Correlations: 

This  program  produces  coefficients  of  net  correlation  of  any  order 
from  1  to  19  in  matrix  form.   Coefficients  of  successively  higher 
order  may  be  obtained  by  repeated  calls  to  the  program,  each  time 
using  as  input  the  previously  generated  partial  correlation  matrix; 
or  several  variables  may  be  held  constant  at  the  same  time  by  one 
call  to  the  program. 

The  general  equation  used  is: 

rij.abc. . .(n-1)"  rin.abd. . . (n-1)  *   ij .abc. . . (n-l) 
ij.abc.n  "  (-.  _  r2  \l/2  /-,    "  \l/2 

U   r  in.abd...(n-l)j    {1  '   rjn.abc  . .  (n-1) ' 


r . 


References : 

Mills,  F.C.   Statistical  Methods,  Holt,  Rinehart  and  Winston. 
New  York,  1955?  3rd  edition. 

B.   Tetrachoric  Correlations: 

This  type  of  correlation  coefficient  is  used  when  continuous  normally 
distributed  variables  are  measured  dichotomously. 

This  program  is  based  on  a  program  by  Roald  Buhler  at  Princeton 
University  which  in  turn  is  based  on  a  650  program  written  at  the 
Educational  Testing  Service.   The  approximation  used  was  developed 
by  Professor  Ledyard  Tucker. 

Restrictions 

A.  Partial  Correlations: 

Input  matrices  may  be  no  larger  than  1^0  x  ihO   and  must  be  compatible 
with  SOUPAC  conventions.   In  most  cases  the  original  input  to  the 
program  will  be  a  matrix  of  zero  order  correlations  (see  CORRELATION 
program  write-up) . 

B.  Tetrachoric  Correlations: 

This  option  is  limited  to  lUo  variables.   All  observations  should  be 
coded  either  0  or  1.   The  program  generates  cross-count  tables  before 
computing  the  correlation  coefficients. 
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III.   Parameters 


The  program  name,  PARTIAL  CORRELATION,  should  be  followed  by  the 
following  parameters : 


Parameter 
Number 


2 
3 

k   -  22 


Use  or  Meaning 

Input  address  of  R  if  partial  correlations  or  raw 
data  if  tetrachoric  correlations.  (R  is  a 
correlation  matrix). 

Output  Address  of  correlations  desired. 

0  if  tetrachoric  correlations  are  desired 

1  if  partial  correlations  are  desired 

Variables  to  be  held  constant  in  using 
partial  correlations. 


IV.   Special  Comments 

When  there  is  a  zero  cell  or  sufficiently  close  so  that  the 
tetrachoric  correlation  cannot  be  computed  by  this  approximation,  a 
value  of  -1.0  is  used  if  the  missing  cell  is  off-diagonal.   If  a 
diagonal  cell  is  zeroish  (i.e.,  if  a  variable  is  all  zero  or  all  one 
its  correlations  are  set  to  0.0. 

Blanks  are  counted  as  zeroes. 

V.   Examples 

A  series  of  observations  of  8  variables  are  used  to  obtain  3rd 
order  partial  correlations  with  variables  5,  "J,    and  8  held  constant: 

/*ID 

//  EXEC   SOUPAC 

//SYSIN  DD   * 

CORRELATIONS  ( CARDS) ( ) (SEQ  l)  . 

PARTIAL  CORRELATION  (SEQl)  (PRINTl (l) (5) (7) (8) . 

END  SOUPAC 

DATA(8)(8F6.2) 


END  // 
/* 
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PRINCIPAL  AXIS  FACTOR  ANALYSIS 
(Eigenvalues  and  Vectors) 


General  Description 

The  purpose  of  PRINCIPAL  AXIS  FACTOR  ANALYSIS  is  to  determine  a 
factor  matrix,  F,  given  a  Gramian  matrix,  R,  of  order  n  such  taht 

F(n,f)F*(f,n)  -  R*(n,n) 

where  R*  is  an  approximation  to  R. 

The  column  vectors  of  F  are  defined  as  the  factors  (measures  of 
dimensionality)  of  the  original  matrix,  R.   The  solution  for  the  matrix 
F  is  the  classical  eigen  problem.   Consequently,  the  computations  are 
done  by  an  eigenvalue  subroutine.   Before  output  the  eigenvectors,  E.*, 
are  scaled  as  follows : 

F(I,J)  =  E(l,j)*LAMBDA(«J^**.5 

for       I  =  1, ....,n.    J  =  1, ....,n. 
to  generate  the  principal  axis  factors,  F. 

For  a  more  detailed  discussion  see: 

Harry  Harmon,  Modern  Factor  Analysis,  Chicago,  University  of 
Chicago  Press,  i960,  pp.  15^-191. 

Restrictions 

The  input  matrix  for  the  PRINCIPAL  AXIS  program  must  not  exceed  the 
dimensions  of  190  x  190  double  precision.    The  input  matrix  is  further 
limited  to  being  a  square,  symmetric  matrix.   Generally  correlation, 
covariance,  or  cross-product  matrices  are  used  as  input  data.   It  should 
be  noted  that  matrices  with  large  numerical  entries  such  as  cross-products 
may  generate  output  values  which  cannot  be  printed  under  the  fixed  out- 
put formats.   The  probability  of  this  happening  is  very  small.   Any  com- 
munality  estimation  (i.e.,  change  in  the  diagonal  entries  of  R)  must  be 
done  prior  to  the  input  of  R,  to  the  PRINCIPAL  AXIS  program. 

If  the  communality  estimates  are  used,  the  user  should  check  the 
resulting  roots  for  negative  numbers.   If  any  exist  the  associated  vector 
is  meaningless. 

The  input  data  may  come  from  any  source  conforming  to  SOUPAC   Similarly, 
the  output  codes  follow  the  established  conventions  and  are  specified  at  the 
option  of  the  user. 

The  R  matrix  may  be  completely  factored  (i.e.,  N  factors  from  N  vari- 
able matrix) .  However,  there  are  three  criteria  which  may  be  used  to  stop 
the  factoring: 


3- 
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The  user  may  specify  the  number  of  factors  to  be  extracted. 
This  criterion  provides  an  upper  limit  beyond  which  factoring 
will  not  proceed.   Therefore,  it  is  necessary  to  put  the 
maximum  value  in  this  limit  in  cases  where  it  is  not  the 
primary  criterion. 

The  percentage  of  total  variance  removed  from  R  is  the 
second  limiting  criterion.   This  parameter  also  specifies 
an  upper  limit  to  the  process.   Therefore,  it  should  be  set 
at  100  per  cent  unless  it  is  the  criterion  for  stopping. 

The  last  criterion  is  to  stop  when  the  factor  contribution 
(eigenvalue  or  root)  falls  below  1.   The  use  of  this  pro- 
cedure is  dictated  by  the  presence  of  its  parameter. 


If  all  three  criteria  are  employed  simultaneously,  factoring  is  stopped 
by  whichever  criterion  is  first  met. 


III.   Parameters 


The  parameters  for  the  PRINCIPAL  AXIS  program  appear  on  the  program 
call  card.   They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 

1 

2 

3 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

Output  Adddres.   SEQUENTIAL  1-15  and/or  PRINT. 

Maximum  number  of  factors  to  be  extracted. 
This  must  be  less  than  or  equal  to  the 
order  of  the  input  matrix.' 

The  percentage  of  total  variance  to  be 
removed  expressed  as  an  integer  between 
0  and  100. 

The  presence  of  a  number  greater  than  0 
indicates  the  factoring  should  stop  when 
the  eigenvalues  (roots)  fall  below  unity. 

Output  Address  of  Eigenvectors 


The  address  of  where  eigenvalues  are  to  be 
placed  as  a  row  vector  if  they  must  be 
stored  for  further  use.   If  values  need 
not  be  saved,  leave  parameter  blank.   PRINT 
is  not  valid. 
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Parameter 
".'  >■•  I  er  Use  or  Meaning 

8  Mode  of  sorting  eigenvalues  and  associated 

vectors.   The  codes  are  as  follows: 

Code  Meaning 

0  Descending  algebraic  order 

1  Descending  absolute  values 

2  Order  of  extraction 

10  Ascending  algebraic  order 
(the  k  smallest  root) 

11  Ascending  absolute  values 

12  Reverse  order  of  extraction 

Leaving  any  parameter  blank  is  the  same  as  specifying  zero.   Con- 
sequently, options  which  are  not  needed  can  be  avoided  by  leaving  the 
associated  parameter  blank. 

IV.   Special  Comments 

No  reliable  timing  estimates  exist  as  yet. 
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PROBIT 


(Mn 


emonic 


PRB) 


General  Description 

This  program  calculates  maximum  likelihood  estimates  for  the  parameters 
A  and  B  in  the  probit  equation: 

Y  -  A  +  BX 

An  iterative  scheme  is  used. 


Restrictions 

The  input  vectors  must  be  equal  length  k  and: 
input  vector  comes  from  a  separate  input  address. 

Parameters 


3  <  k  <  3000.  Each 


Parameter 
Number 


Use  or  Meaning 

Input  vector  of  dosage  level, 
SEQUENTIAL  1-15- 


CARDS  or 


Input  vector  of  number  of  subjects  tested 

at  each  dose  level.   CARDS  or  SEQUENTIAL  1-15- 

Input  vector  containing  the  number  of 
subjects  at  each  level  responding  to  the 
drug.   CARDS  or  SEQUENTIAL  1-15- 

Output  vector  of  length  k  containing  the 
proportion  of  subjects  responding  to  the 
various  close  levels  of  the  drug.   SEQUENTIAL  1-15, 
and/or  PRINT. 

Output  vector  of  length  k  containing  the  values 
of  the  expected  probit  for  the  various  levels  of 
the  drug.   SEQUENTIAL  1-15  and/or  PRINT. 


Printed  output  consists  of: 


1  -  Estimate  of  intercept  constant  A 

2  -  Estimate  of  probit  regression  coefficient  B 

3  -  Chi- square  value  for  a  test  of  significance  of  final 

probit  equation 

(R.  -  N.P.)2 

X   "  f=1    NiPi(l  -  Pi) 

where  Rj_  =  number  of  responses  (input  address  3) 

Ni  =  number  of  objects  tested  (input  address  2) 
Pj_  =  cumulative  normal  distribution  values  corresponding 
to  Z±   where  Z±   =  (A  +  BXj.)  -  5 
where  A  and  B  are  from  final  probit  equation 


h   -  Degrees  of  freedom  for  X^ 
d.f.  =  k  -  2 

References: 

D.  J.  Finney,  Probit  Analysis,  Second  Edition,  (Cambridge  University  Press 
1952). 

The  program  was.  adapted  from  the  IBM  Scientific  Subroutine  Package, 
360A-CM-03X,  Version  III,  page  kk. 

IV.   Example 

If  two  or  more  input  addresses  are  cards,  the  cards  must  be  stacked 
in  order  of  their  parameter  numbers.   For  example: 

/*ID 

//  EXEC  SOUPAC 

//SOUPAC . SYSIN  DD   * 

MAT. 

MOVE ( CARDS )(SEQ2) 

END  P 

PRB( CARDS ) (SEQ2 ) (CARDS ) (PRINT ) . 

END  S 

DATA(1)( ) 

!     cards  for  SEQ  2 

END# 

DATA(1)( ) 

Cards  for  Parameter  1 

END# 

DATA(1)( ) 

Cards  for  Parameter  3 


END# 
/* 


NOTE   The  mnemonic  for  PROBIT  is  PRB,  nor  PRO. 


PROCRUSTES  (Oblique  Case) 

General  Description 

This  program  offers  3  options: 

1.  (Oblique)  Procrustes.   Given  A,  B,  the  program  solves 

AT*  =  B  +  E 

for  T*  in  a  least  square  sense  (i.e.,  minimizing  tr[E'E]), 
so  that 

T*  =  (A'A^A'B, 

and  then  normalized  T*  by  columns  to  yield  T  =  T*D  so  that 

diag  (T'T)  =  I.   It  then  computes  AT  which,  in  a  loose  sense, 

can  be  regarded  as  a  least  squares  fit  to  A  to  B  under  the 

restriction  that  diag  (T'T)  =  I.   It  also  provided 

Cf  -  Dn(T'T)~l  where  Dn  is  a  normalized  diagonal  matrix 

so  that  diag  (C-p)  =  I.   If  D  gave  the  cosines  between  tests 

Cx.  will  give  the  factor  intercorrelations.   A  has  to  be  a 

full  column  rank. 

2.  Dwyer  Extension  Analysis.   Given  F  =  R-fcc>  a  centroid  or 
equivalent  matrix  of  cosines  between  tests  t  and  uncorrelated 
factors  c,    and  L  =  Rem  a  matrix  of  cosines  between  uncorre- 
lated factors  c  and  uncorrelated  reference  vectors  n,  this 
program  computes 


Q 


Ttn  =  F(F'F)_1L 


which  is  used  as  a  post-multiplier  on  some  correlations 
matrix  Re^.  between  the  tests  t  x  in  F  and  some  set  of 
extension  variables  e  given  Ren,  the  cosines  of  the  extension 
variables  e  with  reference  n,  to  the  extent  that  the  former 
can  be  projected  into  the  sub- space  spanned  by  the  latter. 
This  multiplication 

Ren  =  Ret  Ttn 
can  be  performed  by  use  of  the  MATRIX  program. 

3.   Left  Inverse  (transposed).   Given  A,  the  program  will  return 

Q  =  A(A'A)"1 

provided  A  was  a  full  column  rank.   Q  is  the  transposed  left 
inverse  of  A  which  can  be  used  in  lease  squares  application. 


II.   Restrictions 


Input  is  restricted  to  matrices  (A,  B,  or  F)  of  order  190  x  50  or  less, 
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III.   Parameters 


The  parameters  for  this  program  appear  on  the  program  call  card. 
They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 


Use  or  Meaning 

Procrustes 

PEA 

LINV 

Input  Address 
CARDS  or 
SEQUENTIAL  1-15- 

A 

F 

A 

Input  Address 
CARDS  or 
SEQUENTIAL  1-15- 

B 

L 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

A 

F 

A 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

Choice  Address 


B 


AT 


E 


Q 


A(A'A)' 
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QUADRATIC  PROGRAMMING 

I.  General  Description 

This  program  maximizes  the  quadratic  function  ex  +  1/2  xTDx  subject  to 
the  linear  constraints  Ax  <_  b,  where  c  is  an  n-vector,  D  is  a  symmetric 
negative  definite  n  by  n  matrix  ,  A  is  an  m  by  n  matrix  of  coefficients 
or  constraints  and  b  is  an  m  vector. 

The  Kuhn-Tucker  theory  shows  that  a  solution  to  the  constrained  maxi- 
mization problem  is  obtained  if  and  only  if  vectors  x,  L,  v,  and  w  can  be 
found  such  that : 

Dx  -  ATL  +  v  =  -c 
Ax       +  w  =  b 

where  the  elements  of  x,  L,  v,  and  w  are  non-negative  and  the  conditions 
xv  =  0  and  lw  =  0  are  satisfied.   To  find  these  vectors,  artificial  vectors 
zx  and  z^  are  added  to  the  first  equation  and  a  y-vector  is  added  to  the 
second.   Simples  techniques  are  then  used  to  eliminate  first  the  y 
and  then  the  z^  amd  z2  variables. 

References : 

Carr ,  C .  R .  and  C .  H .  Howe ,  Quantitative  Decision  Procedures  in  Management 
and  Economics,  McGraw-Hill,  196U. 

Hadley,  G. ,  Nonlinear  and  Dynamic  Programming,  Addi son-Wesley,  \9Gk. 

Wolfe,  P.,  "The  Simplex  Method  for  Quadratic  Programming",  Econometrica, 
27,  1959,  pp.  382-398. 

NOTE:   Carr  and  Howe  claim  that  elements  of  the  w-vector  may  not  be  en 
entered  in  the  first  stage  of  the  simplex  procedure.   Since  this  requires 
that  there  exist  a  solution  to  Ax  =  b,  it  is  a  severe  restriction.   It 
is  also  unnecessary,  and  this  program  does  enter  w-variables  during  the 
first  stage.   Otherwise,  the  procedures  used  closely  follow  those  of 
Carr  and  Howe. 

II.  Restrictions 

The  maximum  number  of  x-variables  is  kO.      The  number  of  x-variables 
plus  the  number  of  constraints  must  be  <_  80. 

The  D-matrix  must  be  negative  definite.   If  this  is  dubious,  use  the 
PRINCIPLE  AXIS  FACTOR  ANALYSIS  program  to  extract  the  eigenvalues.   All 
must  be  negative.   Semi-definite  D-matrices  may  be  perturbed  or  the  user 
may  limit  the  number  of  iterations  to  be  performed.   If  this  limit  is 
exhausted,  final  solution  vectors  will  be  printed  out  (see  below). 

The  only  form  of  input  is  a  matrix  of  data.   If  there  are  n  x-variables 
and  m  constraints ,  the  matrix  should  have  n  +  1  columns  and  m  +  n  rows , 
partitioned  as  follows: 


D  (n  x  n) 


A  (ra  x  n) 
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c  (n  x  1) 


b  (m  x  1) 


Note  that  this  is  the  c  vector,  not  the  -c  vector  mentioned  in  the 
Kuhn-Tucker  formulas.  Also  note  the  +  sign  and  the  1/2  coefficient  of  the 
xDx  term.  All  constraints  in  this  type  of  input  are  assumed  to  be  <  type. 
Multiply  >  constraints  through  by  -1.   The  equality  constraing: 


n 
Z 


a- ,X.  =  b. 

ij  J    i 


is  equivalent  to  the  two  constraints  Za-^X.  •;  b-  and  L  -a.  .X.  - :  b.  . 

This  matrix  can  be  read  in  from  cards  or  from  temporary  storage.   Only 
one  problem  can  be  read  from  a  tape  or  sequential  location.  Multiple  problems 
must  be  read  from  cards,  the  matrix  for  each  problem  preceded  by  its  own 
"DATA  (N  +  l)  (FORMAT)"  card  and  followed  by  its  own  "END#"  card. 

The  elements  of  the  w-vector  are  always  non-negative  and  are  to  be 
considered  "slack"  for  <;  constraints  and  "surplus"  for  >  constraints. 

The  user  may  obtain  the  basis  vector  at  the  end  of  each  iteration 
showing  which  variables  are  in  the  basis  and  their  quantities  (option  2). 
He  may  alternatively  have  the  entire  matrix  printed  out  after  each  iteration 
(option  3)«   The  user  is  cautioned  that  option  3  can  use  immense  quantities  . 
of  paper  and  time  unless  the  problem  is  very  small. 

A  method  outlined  in  Hadley,  pages  183  -  186,  is  used  to  avoid  cycling 
in  cases  of  degeneracy. 


III.   Parameters 


The  program  call  card  should  have  the  name  QUADRATIC  PROGRAMMING 
followed  by  these  parameters: 


Parameter 
Number 

1 

2 

3 


Use  or  Meaning 

Number  of  problems  following 

Input  Address.   SEQUENTIAL  1-15  or  CARDS. 

Output  option: 

0  if  final  results  only 

1  if  iterated  basis  vectors 

2  for  entire  iterated  matrix 


Limit  on  number  of  iterations  if 
desired.  Leave  blank  otherwise. 
is  1000. 


Default 
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Parameter 
Numl  er 


Use  or  Meaning 

Pertubation  quantity.   Punch  quantity 
to  be  subtracted  from  diagonal  of  D- 
matrix  between  asterisks  instead  of 
parenthesis;  e.g.,  *.001*.   Leave  blank 
if  not  desired. 


IV.   Examples 
Example  I 


Suppose  we  wish  to  maximize  the  quadratic  function 
F  =  10x2  +  20x2  +  15x3  -  Ix-l2  -  2x22  +  lxjxg 

subject  to  the  constraints 


2xi  +  3x2  +  lx^  <:  50 
+  3X0  <;  70 


lx 


3xi  +  2x> 


<  60 


Since  the  D-matrix  is  only  negative  semi-definite,  it  should  be  perturbed 
to  insure  convergence  to  a  solution.   The  following  set  of  cards  would  solve 
the  problem  using  data  matrix  input : 

/*ID 

//  EXEC  SOUPAC 

//SYSIN  DD  * 

QUADRATIC  PROGRAMMING  (l) (CARDS) (0) (0)*.001*. 

END  SOUPAC 

DATA(l+)(i+J3.0) 


-2      1 

0 

10 

1   -k 

0 

20 

c 

0    0 

; 

1^ 

2     3 

1 

50 

1     0 

3 

70 

b 

3     2 

0 

60 

en: 

/* 

QUADRATIC  PROGRAM/. 

Pace  k 


Example  II : 

Maximize 

F  =  8x-l  +  10x2  -  X]_2  -  x22 
subject  to  the  constraint 

3xx  +  2x2  <  6 
The  D-matrix  is  negative  definite.   The  problem  would  be  set  us  as  follows: 

/*ID 

//  EXEC   SOUPAC 

//SYSIN  DD   * 

QUAD  (1)(C)(2). 

END  SOUPAC 

DATA(V(F2.0,2F3.0) 

-2  0 
D   O  -2M0  c 


3   2 
A  END# 

/* 

The  extreme 


6 
b 


value  of  the  objective  function  for  this  example  is  .213  E02. 
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RANDOM  NUMBER  GENERATOR 


(Mnemonic:  RND) 


General  Description 

This  program  calculates  a  matrix  of  normally  distributed  random 
numbers.   An  approximation  formula  is  used  to  normalize  uniformly 
distributed  random  numbers 

k 
Z 


Y  = 


i=l 


Xi   2 


s/k/12 


where  the  Xj_are  the  uniformly  distributed  random  numbers,  and  Y  is  the 
normally  distributed  number  with  mean  zero  (0)  and  standard  deviation 
one  (1^ .  K  is  set  ot  12  by  the  program.   Y  is  then  transformed  to  the 
input  scale  by  multiplying  by  the  standard  deviation  and  adding  the  mean, 


II 


Parameters 


Parameter 

Number 


Use  or  Meaning 

Input  Address  of  9  (nine)  digit  odd  integer 
used  as  a  starting  point  for  the  random 
number  generator.   CARDS  or  SEQUENTIAL  1-15- 

Output  Address  of  random  numbers  matrix. 
SEQUENTIAL  1-15-   PRINT  is  not  valid. 


Mean  of  random  numbers  enclosed  in  asterisks, 
i.e.,  *0.0*. 

Standard  deviation,  i.e.,  *1.0*. 

Number  of  rows  in  output  matrix  of  random 
numbers. 


Number  of  columns  in  output  matrix  of  random 
numbers. 


Special  Comments 


Output  Address  of  9  digit  integer  which  is 
finishing  point  of  the  random  number  generator. 
Do  not  specify  PRINT  since  number  is  automatically 
printed. 


If  this  program  is  used  with  the  same  integer  starting  point,  it  will 
generate  the  same  numbers.   Thus,  use  Parameter  7  to  output  the  finishing 
location,  and  then  pass  that  address  as  the  starting  location  for  the  next 
use  of  this  program. 

Reference 

IBM  System/360  Scientific  Subroutine  Package  (36OA-CM-O3X)  Version  III 
page  77. 
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SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 


RANK  ORDERING  PROGRAM 


General  Description 

A.  Purpose 

The  RANK  ORDERING  program  receives  as  input  raw  data  matrix  and 
produces  as  output  a  matrix  in  which  each  element  has  been  replaced 
by  a  number  denoting  the  rank  of  the  element  WITHIN  ITS  COLUMN.   In 
other  words,  each  column  of  the  input  matrix  is  considered  a  separate 
variate  and  will  be  converted  to  a  corresponding  ranking. 

The  smallest  variate-value  is  assigned  rank  1.0,  the  next  largest 
a  rank  2.0,  etc.,  until  the  largest  variate-value  is  assigned  the 
highest  rank.   In  the  case  of  tied  values,  identical  ranks  are  assigned 
to  equal  values  ,  the  rank-number  being  set  equal  to  the  average  of  the 
rank  which  would  occur  if  the  tied  values  were  distinguishable.   This 
is  sometimes  known  as  "mid-rank  method". 

B.  References 

Kendall,  Maurice  G.,  Rank  Correlation  Methods,  Charles  Griffin  and 
Co.,  Ltd.   London,  19U8. 

Restrictions 

A .  Input 

The  input  data  to  this  program  may  come  from  any  source.   If  cards 
are  used  as  input,  the  number  of  rows  in  the  input  matrix  must  be 
specified  on  the  data  format  card  and  the  total  number  of  elements 
in  the  matrix  may  not  exceed  30,000.   The  maximum  number  of  rows 
for  any  matrix  input  to  this  program  is  30,000,  and  the  maximum 
number  of  columns  for  any  matrix  input  to  this  program  is  U50. 

B .  Output 

If  an  input  matrix  contains  more  than  30,000  elements,  an  automatic 
partitioning  of  the  input  data  occurs  such  that  each  partition  contains 
the  maximum  number  of  complete  columns  possible  within  the  constraint 
that  no  one  partition  may  contain  more  than  30,000  elements. 

The  results  of  the  ranking  of  each  partition  are  output  separately, 
one  partition  per  output  address  specified  as  a  parameter  on  the  pro- 
gram parameter  card.   A  maximum  of  twenty-one  such  output  address  are 
allowed . 

CAUTION :   If  partitioning  is  anticipated,  the  user  should  specify  one  output 
address  for  each  partition  anticipated.   This  warning  applies  especially  in 
the  case  where  printed  or  punched  output  occurs.   Printing  and  punching  will 
occur  only  for  the  partitions  for  which  it  is  specified. 
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The  exception  is  for  partitions  over  the  twenty-first  one.  For  partitions 
beyond  the  twenty- first,  printing  and  punching  is  done  if  it  was  specified 
for  the  twenty-first  partition.  However,  no  partitions  beyond  the  twenty- 
first  one  may  be  stored  on  a  peripheral  device  (SEQUENTIAL  address). 

C.   Data 

Since  all  comparisons  in  this  program  are  done  in  single  word  length 
operands,  in  some  cases  the  program  may  not  be  able  to  successfully 
differentiate  between  two  values  which  agree  through  the  first  five 
significant  digits  and  differ  in  subsequent  digits. 


Ill .   Parameters 

The  parameters  for  the  RANK  ORDERING  program  must  follow  the  program 
name  on  the  program  call  card  in  the  order  given  below: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address. 

2-23  Output  Address. 

IV.   Special  Comments 

If  RANK  ORDER  correlation  coefficient  P  (Spearman's  rho)  is  desired, 
the  rankings  should  be  input  to  the  CORRELATION  program  (see  individual 
program  description)  and  the  Product  Moment  Correlation  coefficient 
obtained. 

CAUTION:   Input  to  the  CORRELATION  program  is  restricted  to  175  columns 


(variables' 
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JPAC  (Statistically  Oriented  Users  Programmers  and  Consultants) 


SCALOGRAM  ANALYSIS 


General  Description 

The  SCALOGRAM  ANALYSIS  (mnemonic  :   SCA)  was  developed  to  provide  a 
method  of  producing  Guttman  scales  automatically  without  the  need  of 
external  decisions  to  determine  which  items  do  and  which  items  do  not  enter 
into  Guttman  scales.   Items  are  grouped  together  in  as  few  as  possible 
submatrices  with  each  subgroup  having  a  maximum  homogeneity  within  each 
submatrix.   Each  item  from  the  total  group  is  chosen  to  fit  into  only  one 
submatrix. 

The  SCALOGRAM  program  is  started  by  choosing  an  item  from  the  total 
group  and  then  it  searches  the  remainder  of  the  items  to  find  an  item 
similar  to  the  item  chosen.   Similarity  is  tested  by  using  an  error  cri- 
teria and  a  chi-square  test  to  insure  that  the  items  are  similar.   If  the 
above  criteria  are  met,  this  item  is  added  to  the  first  item  and  a  scale 
is  formed.   This  last  item  is  then  used  to  find  another  similar  item  and 
this  procedure  continues  until  either  of  the  two  criteria  is  not  met. 
Whenever  a  criteria  fails,  the  scale  is  terminated  and  a  new  scale  is 
started. 

SCALOGRAM  will  only  work  for  dichotomous  data  and  it  can  be  used 
to  analyze  both  subject-wise  and  item-wise.   SCALOGRAM  differs  from 
Guttman  analysis  in  three  ways:   l)   It  uses  an  empirical  rather  than  a 
rational  basis  for  selecting  items  to  enter  a  scale.   2)   It  uses  a 
statistical  method  of  deciding  on  groups  and  for  testing  the  scale- 
ability  of  the  item.   3)   It  yields  multiple  scales  rather  than  reject 
the  scale  hypothesis  for  the  whole  item  set. 

SCALOGRAM  can  be  considered  to  be  more  descriptive  than  the  raw  data 
but  less  than  factor  analysis.   SCALOGRAM  also  is  unlike  factor  analysis 
in  that  SCALOGRAM  is  not  bound  to  linear  assumptions  about  the  regressions 
involved.   Factor  analysis  is  set  up  to  study  quantitative  variables  and 
will  not  show  correct  relationships  between  qualitative  variables,  SCALOGRAM 
will  show  what  relationships  do  exist  between  qualitative  variables.   (See 
Guttman  1950  for  a  complete  discussion  of  the  relation  between  the  scalogram 
technique  and  other  statistical  procedures) .   (See  Lingoes  1963  for  the 
complete  algorithm  for  SCALOGRAM')  . 

References : 

Guttman,  L.  "Relation  of  Scalogram  Analysis  to  other  Techniques." 
In  Stouffer,  et  al.,  Measurement  and  Prediction.  Princeton,  N.J. : 
Princeton  University  Press,  1950  (P.  172-212). 

Lingoes,  J.  C.   "Multiple  Scalogram  Analysis.   A  Set-Theoretic  Model 
For  Analyzing  Dichotomous  Items."  Educational  and  Psychological 
Measurement  XXIII  (1963),  501-52U. 

Lingoes,  J.  C.   "A  Multiple  Scalogram  Analysis  of  Selected  Issues 
of  the  83rd  U.S.  Senate."  American  Psychologist,  XVII  (1962),  327- 
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II 


Parameters 

The  program  mnemonic  is  SCA 
the  program  card : 


Parameter 
1 
2 
3 


The  following  parameters  appear  on 
Use  or  Meaning 


Input  Address 

Address  of  Labels 

A  1  indicates  that  the  matrix  should  be 
transposed 


Since  the  program  scales  by  columns  or  items,  to  scale  by  subjects 
the  matrix  must  be  transposed. 

Labels  can  be  used  to  describe  items;  they  can  be  input  from  cards 
or  tape.   A  maximum  of  28  characters  is  allowed  per  label  and  they  should 
be  expressed  as  follows:   DATA(n)  (nAU)  where  n<  7  •   A  separate  card  for 
each  label  is  most  convenient  to  use  with  the  description  in  the  first 
28  columns.   If  both  labels  and  data  are  being  input  from  cards,  labels 
must  precede  data. 

III.   Restrictions 

Both  the  number  of  items  and  the  number  of  subjects  is  restricted 
to  i+90.   Lables  are  restricted  to  28  characters. 

Data  must  be  coded  as  O's  and  1's.   If  data  is  not  of  this  form, 
TRANSFORMATIONS  may  be  used  to  recode  it. 

IV.   Examples 

SCA(C)(C)(1). 
ENDS 
DATA(7' {7Ah) 

labels 
END# 

data(Uo)(Uofi.o) 


data 


END# 


Labels  and  data  are  on  cards,  28  columns  are  used  for  labels  and  scaling 
will  be  done  by  rows. 
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SCA(Sl). 

ENDS 

DATA(30)(30F1.0) 

!   data 
END# 
Data  is  on  SEQUENTIAL  1  and  scaling  will  be  done  by  columns 


September  26,  1969 

SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 


SQUARE  ROOT  FACTOR  ANALYSIS 
General  Description 

The  SQUARE  ROOT  method  of  factor  analysis,  also  called  the  Diagonal 
Method,  by  L.  L.  Thurstone,  decomposes  a  correlation  matrix  R  (or  any 
other  positive  semi-definite  or  definite  symmetric  matrix)  such  that 

R  =  F'F'  +  R(k+1) 

where  R(k+l)  is  the  residual  matrix  after  extracting  k  factors.   Of  course 
if  all  n  factors  are  extracted,  the  residual  matrix  becomes  a  null  matrix. 

The  factor  fj  is  computed  by  dividing  each  element  of  the  j^h  column 
of  R  by  its  diagonal  square  root : 


f  •  •  =    T' 


/■/rTT   (i  =  1,2, 


,n) 


The  matrix  A  =  f-t'f1  is  then  subtracted  from  R  and  the  operation  repeated 
on  the  residual  matrix. 

Prior  to  the  widespread  use  of  high  speed  computers,  the  SQUARE  ROOT 
method  was  sometimes  used  as  a  substitute  for  the  PRINCIPAL  AXIS  or  CENTROID 
method  due  to  the  relative  ease  of  computing  a  square  root  factor.  When 
used  in  this  way,  one  seeks  to  extract  the  maximum  variance  for  each  factor, 
in  which  case  Parameter  k   should  be  blank.   The  program  then  selects 
the  next  column  on  the  basis  of  the  largest  residual  column  sum  of  squares. 

Nowadays,  however,  the  SQUARE  ROOT  method  is  more  likely  to  be  used 
for  special  purposes.  By  selecting  successive  pivot  variables,  the  user 
retains  control  over  the  factoring.  Factors  are  passed  directly  through 
the  test  variables  and  the  effect  of  these  variables  is  removed  from  the 
matrix.  The  communalities  or  row  sums  of  squares  are  the  squared  multiple 
correlations  of  the  remaining  variables  with  the  pivot  variables. 

The  pivots  selected  may  be  any  columns  in  the  matrix.   Let  us  assume, 
however,  that  these  are  adjacent  to  each  other  in  the  upper  right  hand 
corner  of  the  partitioned  matrix  below: 


R  = 


RPP  Rps 
Rsp  Rss 


Then  the  effect  of  pivoting  successively  on  the  variables  in  the  upper 
right  hand  corner  is  shown  by  the  residual  matrix  as  follows: 

r(p+1)  =  R-F^F  '  =| 


0 


r  P 


0 


13   ^R   R   _1R 
ss  sp  pp   ps 


Restrictions 


A.   Dimension 

Maximum  size  of  the  R  matrix  is  190  variables. 
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B.   Special  Conditions 

1.  The  researcher  may  specify  the  extraction  of  any  number  of 
factors  up  to  dimension  of  R. 

2.  The  researcher  may  specify  the  diagonal  element  to  be  used 

in  the  extraction  of  each  factor,  or  he  may  have  the  procedure 
remove  the  maximum  variance  each  time. 

3.  The  residual  matrix  may  be  saved  if  the  researcher  desires. 


III.   Parameters 


Following  the  program  name  the  parameters  must  appear  in  the  following 
order  on  the  program  call  card: 


Parameter 
Number 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
(Correlation  or  positive  definite  or  semi- 
de  f i  nit  e  mat  r ix ) . 

Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

Number  of  factors  extracted. 

Input  Address  for  diagonal  elements. 
CARDS  or  SEQUENTIAL  1-15- 

Output  Address  for  residual  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 


IV.   Special  Comments 

If  the  diagonal  element  for  each  factor  is  specified,  and  if  both 
input  addresses  are  cards,  then  data  precedes  diagonal  specification. 


V.   Example 


Assume  you  have  a  20  x  20  correlation  matrix  on  cards  and  that  you 
want  to  extract  15  factors ;  also  you  are  reading  the  pivot  column  from 
cards.   The  program  would  be  set  up  as  follows: 
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/*ID 

//  EXEC   SOUPAC 

//SYSIN  DD  * 

SQU(C)(P)(15)(C)(P). 

END  SOUPAC 
DATA(2C)(8F9.T) 

data 

END  # 
DATA(15)(15I2) 

diagonal  specification  card(s) 

END  # 

/* 
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STANDARD  SCORES 


General  Description 

This  program  is  used  to  calculate  the  following: 
_    N 


Mean 


J 


i=l 


N 


Standard  Deviation 


Variance 


v.  =  s2 


Standardized  Scores 


-    N 
N   E 
i=l 

x2.  - 

N 

(ac. . 

.=1    ^ 

)2" 

m 

2 
N 

• 

s.= 

3 


X.  .  -  X. 
Zij  "    S 

J 

b 


1/2 


SC 


Moving  Averages:   X.  =  — — 


ij 


where  b=number  of  periods 


II.   Restrictions 

A.  The  maximum  number  of  variables  is  U50. 

B.  Means,  standard  deviations,  and  N's  may  be  calculated  using  as  many 
as  2k  control  variables.  Data  must  be  presorted  (for  instance  with 
SORT-MERGE  or  on  a  card  sorting  machine)  on  the  control  variables. 

C.  The  maximum  period  is  20  for  moving  averages. 

D.  Output  is  of  four  categories: 

1.  With  or  Without  Control  Breaks 

a.  Sample  size,  mean,  standard  deviation  and  variance. 

b.  Moving  averages:   sample  size,  mean,  standard 
deviation  and  variance. 

2.  Without  Control  Breaks 


a.  Standard  scores,  printed  output  includes  sample  size, 
mean,  standard  deviation  and  variance.   Missing  data 
option  for  mean,  etc.,  is  possible,  but  user  is  cautioned 
against  its  use.   Means,  etc.,  may  be  output  to  a 
temporary  storage  location. 

b.  Standard  scores  about  a  given  mean  and  standard  deviation. 
Printed  output  includes  sample  size,  mean,  standard 
deviation  and  variance.  Missing  data  option  for  mean, 
etc.,  is  possible.   Output  may  consist  of  standard 
scores  and  about  a  given  mean  and  standard  deviation. 
Output  may  be  print  and/or  two  different  temporary 
storage  locations.   Means,  etc.,  may  be  output  to  a 
temporary  storage  location. 
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III.   Parameters 


The  parameters  appear  on  the  program  call  card  following  the 
program  name  STANDARD  SCORES  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   SEQUENTIAL  1-15.   Cards  if  only 
means,  standard  deviations,  and  variance  desired. 

2  Output  Address  of  Standardized  Scores. 

3  Output  Address  for  Mean,  Sample  Size, 
Standard  Deviation,  and  Variance.   Sample 
Size,  Mean,  Standard  Deviation,  and  Var- 
iance can  be  put  out  on  a  temporary  unit . 
Output  is  in  the  form  of  four  column 
vectors  (N,  X,  S.,  V.). 

k  If  1,  use  N-l,  if  0,  use  N  for  deniminator 

of  standard  deviations.   N-l  gives  an  unbiased 
estimate  of  the  population  standard  deviation 
and  population  variance.  N  gives  the  sample 
standard  deviation  and  sample  variance. 

5  Output  Address  for  Standard  Scores  about  a 

specified  Mean  and  Standard  Deviation. 
SEQUENTIAL  1-13  and/or  PRINT. 

6  If  parameter  5  is  being  used,  place  desired 
Mean  between  asterisks,  for  example,  *50* . 

7  If  parameter  5  is  being  used,  place  desired 

Standard  Deviation  between  asterisks,  for 
example,  *5*- 

8  Moving  Averages:   Put  the  number  of  periods 

(observations)  over  which  it  is  desired  that 
the  data  be  averaged  (i.e.  b)  .   If  control 
variables  are  being  used  and/or  the  actual 
number  of  observations  is  less  than  stated, 
the  data  will  be  averaged  using  the  actual 
number  of  observations. 

9  If  set  equal  to  1,  blanks  coded  as  -0.0  will 

be  checked  for.  ^       0     ,•  „   .,,    -   . 
X.,  S.,  and  V.  will  reflect 

reduced  N.        J   J       J 
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If  using  controls,  on  a  separate  card  immediately  after  the  STANDARD 
SCORES  card,  list  variable  numbers  of  those  variables  used  as  controls. 
For  example,  if  controlling  on  variables  1,  2,  and  k: 

STA(T2)(P^. 
$C-B(1)(2)(U). 

NOTE:   If  there  is  only  1  observation  and  parameter  h   is  set  to  1,  then  the 
mean,  standard  deviation  and  variance  for  that  variable  will  be  set  to  zero. 

If  blanks  are  checked  and  standard  scores  are  requested,  those  observations 
which  have  a  blank  will  remain  blank  after  calculation  of  standard  scores. 
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STEP-WISE  MULTIPLE  CORRELATION 

General  Description 

The  STEP-WISE  MULTIPLE  CORRELATION  program  calculates  the  following: 

-    *i 

Mean:  Xi  =  — 

NE(X.X  )  -  (£X.)(ZX  ) 
Covariance  :   s..  =  ^-^ A- 

1/2 
Standard  Deviation:   s.  =  (s..)  ' 

s.  . 

Product  Moment  Correlation:   r. .  =  - 

ij   s.s. 

1  0 

In  the  step-wise  procedure,  intermediate  results  are  used  to  give 
valuable  statistical  information  at  each  step  in  the  calculation.   These 
intermediate  answers  are  also  used  to  control  the  method  of  calculation. 
A  number  of  intermediate  regression  equations  are  obtained  by  adding  one 
variable  at  a  time  thus  giving  the  following  intermediate  equations. 

a.  Y  =  BQ  +  B1X1 

b.  Y  =  B0  +  B-lX-l  +  B2X2  ,  etc. 

The  coefficients  for  each  of  these  intermediate  equations  and  the 
reliability  of  each  coefficient  are  obtained  by  the  step-wise  procedure. 
The  values  and  reliability  may  vary  with  each  subsequent  equation.   The 
coefficients  represent  the  best  values  when  the  equation  is  fitted  by  the 
variables  included  in  the  equation.   The  variable  is  added  that  makes  the 
greatest  improvement  in  "goodness  of  fit"  or,  stated  another  way,  gives  the 
greatest  reduction  in  variance  of  the  dependent  variable. 

A  variable  may  be  indicated  to  be  significant  at  an  early  stage  and 
enter  the  regression  equation.  After  several  other  variables  are  added  to 
the  regression  equation,  a  variable  in  the  equation  may  be  indicated  to  be 
insignificant.  Under  this  situation  the  step-wise  regression  procedure  will 
remove  the  insignificant  variable  before  adding  an  additional  variable. 
Thus,  at  the  various  steps  in  the  regression  procedure,  only  those  variables 
which  are  significant  will  be  included  in  the  regression  equation. 

The  F  level  to  enter  a  variable  controls  when  variables  enter  the 
equation  and  the  F  level  to  remove  a  variable  likewise  controls  the  removing 
of  variables  from  the  equation. 

The  last  step  in  the  step-wise  procedure  predicts  the  value  of  the 
dependent  variable  for  each  set  of  observations  based  on  the  final  re- 
gression equation.   Deviation  between  the  actual  and  predicted  values  are 
also  calculated. 

(See  parameter  U) . 


STEP-WISE  MULTIPLE  CORRELATION 

Pa; 

For  reference  to  formulas  and  methods  used  see : 

A.  Ralston  and  H.  S.  Wilf,  Mathematical  Methods  for  Digital 
Computers,  New  York,  Wiley  and  Sons,  i960,  pp.  191-195- 

II.   Restrictions 

The  maximum  number  of  independent  variables  in  this  program  is  199* 
The  dependent  variable  must  be  the  last  variable  of  each  row. 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC.   Output  may  be  PRINT  only. 

III.   Parameters 

The  parameters  for  the  STEP-WISE  MULTIPLE  CORRELATION  program  appear 
on  the  program  call  card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

(See  parameter  k   for  special  conditions). 

2  "F"  level  to  enter  an  independent  variable 

into  the  regression  equation.   An  example 
would  be:  *k.O* 

3  "F"  level  to  remove  a  variable  from  the 

regression  equation.   An  example  would 
be:  *k.O  * 

h  This  parameter  should  be  set  to  1  if  the 

predicted  dependent  variables  are  to  be 
calculated.   (if  this  option  is  needed, 
input  data  must  not  be  from  cards.)   0  or 
blank  if  not  wanted. 

5  1  if  constant  term  in  equation  is  assumed 
to  equal  zero  (0) . 

6  1  if  want  to  use  weighting  factor.   (if  a 

weighting  factor  is  used,  it  must  be  the 
last  variable  in  the  input  data  row. ) 

7  1  if  intermediate  steps  of  regression  are 
not  to  be  printed. 

1  if  do  not  want  cross-product  matrix  printed 

2  if  input  data  is  in  the  following  form: 

1  N       N+l 


CORRELATION 
MATRIX 
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STANDARD 
N+1   DEVIATION 
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Use  or  Meaning 

1  if  do  not  want  means  and  standard  deviation 
printed 

1  if  do  not  want  covariance  to  be  printed 

1  if  do  not  want  correlations  to  be  printed. 

Tolerance  to  be  used  to  determine  when 
singularities  are  assumed  to  occur.   If 
this  parameter  is  left  blank  10"5  is  used. 
If  it  is  desired  to  change  this  parameter, 
the  following  would  be'  used:   *1.E -10* 
where  any  number  could  be  substituted  for 
the  10. 

Output  (intermediate  storage)  of  coefficients. 

First  (N)  variables  are  placed  in  regression 
first. 


The  dependent  variable  must  be  the  last  variable  in  the  input  row 
(unless  a  weighting  factor  is  used,  then  the  dependent  variable  will  be  the 
next  to  the  last  variable  in  the  input  row) . 

Negative  F-ratios  may  sometimes  result  in  the  computational  procedure. 
They  should  be  considered  to  be  analytically  zero.   Frequently  negative 
F's  arise  when  input  is  a  missing  data  correlation  matrix.   Results  up  to 
the  negative  F  are  always  correct. 

The  standard  error  of  Y  given  at  each  step  is  the  standard  error  of 
predicted  Y. 

The  program  will  loop  if  a  variable  entered  at  one  step  is  removed 
at  the  very  next  step.   This  can  usually  be  corrected  by  changing  the 
F-levels  to  enter  and  remove  variables. 


October  21,  1969 

SOUPAC (Statistically  Oriented  Users  Programming  and  Consulting) 


THREE  STAGE  LEAST  SQUARES  ESTIMATION 

I.  GENERAL  DESCRIPTION 

The  Three  Stage  Lease  Squares  Estimation  program  calculates  three  stage 
least  squares  estimates  and  an  asymptotic  covariance  matrix.   A  raw  data 
covariance  matrix  and  two  stage  least  squares  residual  covariance  matrix 
are  the  necessary  input.   Calculations  are  carried  out  an  in  "Econometric 
Theory"  by  Arthur  S.  Goldberger,  pp.  3^7-352.   The  coefficients  may  also  be 
stored  for  use  with  the  Econometric  Reduced  Form  and  Residual  Analysis 
program. 

References : 

Goldberger,  Authur,  S.  ,  Econometric  Theory,  New  York,  John  Wiley  and  Sons, 
Inc.   196U. 

Johnston,  J.,  Econometric  Methods ,  New  York,  McGraw-Hill  Book  Company,  Inc., 
I960. 

II.  RESTRICTIONS 

The  raw  data  covariance  matrix  must  be  arranged  in  the  K-Class  Estimation 
program  write-up.   The  program  has  the  following  size  restrictions:  Total 
number  of  coefficients  estimated  <_  lUO. 

(NEQ  x  NVAR)  +  (NEQ  x  NEQ)  +  (NVAR  x  NVAR)  <_  20,000 

NVAR  =  the  total  number  of  variables 
NEQ  =  the  number  of  equations  estimated 

Note:   Endogenous  coefficients  are  printed  out  first,  followed  by  exogenous 
coefficients. 

III.   PARAMETERS 

The  parameters  appear  on  the  program  card  following  the  name  Three  Stage 
in  the  following  order : 

Parameter 
Number  Use  or  Meaning 

1  I. A.  for  raw  data  covariance  matrix 

TAPE  1-5,  DISK  1-9-  (I. A.  is  Input  Address). 

2  O.A.  for  coefficients 

TAPE  1-5,  DISK  1-9.   (O.A.  is  Output  Address). 

3  I. A.  for  residual  covariance  matrix 

TAPE  1-5,  DISK  1-9-   (I. A.  is  Input  Address). 

k  Number  of  equations  to  be  estimated 

5  Number  of  exogenous  variables 
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Subparameters 


For  each  equation  a  card  specifying  the  variables  in  the  equation  must 
follow  the  main  parameter  card  with  the  following  parameters : 

Parameter 
Number  Use  or  Meaning 

1  Number  of  exogenous  variables  in  the  equation 

2  Number  of  endogenous  variables  in  the  equation 

3  to  N  +  2  Variable  number  of  the  N  variables  included  in 

the  equations  with  exogenous  variables  first; 
endogenous  variables  next ,  with  the  variable  on 
which  the  system  is  normalized  last. 

IV.  SPECIAL  COMMENTS 

The  Three  State  Least  Squares  Estimation  program  requires  input  from 
several  other  SOUPAC  programs.  The  following  is  an  example  of  the  steps 
needed  to  calculate  the  necessary  input. 

V.  EXAMPLE 

K1CLASS  ( Tl )  ( T2-)  (0)(0)()(l)(l). 

K2CLASS(T2)(T3)(TU)()(8)(2)*1.*. 

(M(2)(1)(2)(3)(U)(1)(2). 

(U)(2)(5)(6)(7)(8)(2)(l). 

END  P 

ECON  (T3)(T2)()(8)()(T5). 

THREE(T2)(T6)(T5)(2)(8). 

(10(2)(1)(2)(3)(10(D(2). 

(M(2)(5)(6)(7)(8)(2)(l). 

END  P 

Notice  that  the  equation  control  cards  for  both  K2CLASS  and  Three-stage 
Least  Squares  must  be  in  the  same  order. 

Also  notice  that  an  ENDP  card  is  required  after  the  equation  cards. 
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General  Description 

The  TRANSFORMATION  program  serves  two  purposes  in  the  SOUPAC  system. 
First,  this  program  can  be  used  to  perform  transformations  on  data  that 
is  to  be  used  by  other  programs  in  the  system.   Second,  this  program  can 
be  used  by  itself  to  perform  computations  with  the  input  data  to  provide 
final  results. 

Restrictions 

The  maximum  number  of  variables  for  this  program  is  2000.   The  maxi- 
mum sample  size  is  essentially  unlimited. 

No  other  program  in  the  system  will  take  more  than  450  variables  so 
the  user  is  warned  not  to  output  more  variables  with  the  TRANSFORMATION 
program  than  the  next  program  can  use.  For  example,  suppose  that  the 
next  program  to  be  executed  after  TRANSFORMATION  is  CORRELATION  then  the 
maximum  number  of  variables  that  can  be  output  is  175* 

When  discussing  the  arrangement  of  data  on  a  storage  device,  such  as 
a  sequential  data  set,  the  term  record  is  often  used.  A  "record"  refers 
to  a  single  row  of  data  in  a  matrix  of  data.   In  most  instances  this  is 
synonymous  to  observation  over  a  set  of  variables  used  in  a  particular 
study.   Thus  each  record  is  composed  of  a  set  of  values  with  each  value 
corresponding  to  a  particular  variable.   It  should  be  remembered  that 
a  record  does  not  necessarily  correspond  to  the  data  contained  on  one 
physical  card.   It  is  possible  that  one  observation  on  a  set  of  variables 
cannot  be  contained  on  one  card.   Therefore  we  continue  our  set  of  vari- 
ables on  the  next  card.  We  can  continue  in  this  manner  up  to  a  limit  of 
19  cards  per  observation  or  2000  variables  per  observation,  whichever 
comes  first.   If,  however,  the  set  of  values  for  one  observation  can  fit 
in  the  80  columns  of  one  card,  then  a  "record"  and  a  "card"  are  synonymous. 

Before  the  execution  of  every  TRANSFORMATION  program  variables  1  through 
2000  are  set  to  zero.  After  each  row  of  data  has  been  operated  upon  by  sub- 
parameter  cards,  variables  1  through  1000  are  set  to  zero  and  variables  1001 
through  2000  keep  whatever  values  they  have  at  this  point.   This  allows  the 
user  to  accumulate  totals  within  the  TRANSFORMATION  program.  For  example, 
to  execute  a  TRANSFORMATION  program  using  a  data  deck  with  five  cards,  this 
is  what  happens:  Variables  1  through  2000  are  set  to  zero.   Then  your  first 
row  of  data  is  read  in  and  operated  on  by  X  number  of  subparameter  cards. 
Before  the  next  row  of  data  is  read  in,  variables  1  through  1000  are  set  to 
zero,  while  the  contents  of  variables  1001  through  2000  are  left  unchanged. 
This  process  continues  until  all  five  rows  of  data  are  read  in  and  operated 
upon  by  the  subparameter  cards. 

Parameters 

The  parameters  for  the  TRANSFORMATION  program  must  follow  the  program 
name  in  this  order: 
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Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15 . 

2  Output  Address.   SEQUENTIAL  1-15  and/or 
PRINT.   The  use  of  the  characters  (F) 
after  the  word  PRINT  will  cause  your 
results  to  be  printed  in  F  format,  other- 
wise your  results  will  be  printed  in  E 
format. 

3  Number  of  variables  to  be  output.   (The 

first  N  variables  will  be  output) . 

1  <  N  <  2000 . 

Parameters  2  and  3  may  be  0  or  blank  if  not  desired. 

Example : 

TRA( CARDS) (SEQUENTIAL  l/PRINTl (20) . 

The  set  of  subparameter  cards  must  be  followed  by  an  END  PROGRAM 
card. 

IV.   Special  Comments 

Output  can  be  reached  in  two  ways:   l)  By  parameter  2  on  the  parameter 
card  and  2)  by  the  subparameter  instruction  OUTPUT.   The  OUTPUT  instruction 
can  be  used  any  number  of  times  within  the  TRANSFORMATION  program.   Either! 
or  both  of  these  methods  can  be  used  to  "output"  data  within  the  TRANSFORMATl! 
program . 

The  capital  letters  (A,B,C,D, etc. )  used  in  the  subparameter  section 
refer  to  variable  numbers,  that  is,  the  capital  letters  represent  some 
number  between  1  and  2000.   Small  letters  refer  to  constants.   Constants 
enclosed  in  parentheses  are  integer  constants.   Constants  enclosed  in 
asterisks  are  floating  point  constants.   If  the  decimal  point  is  not  in- 
cluded within  the  asterisks  it  is  assumed  to  be  to  the  right  of  the  number. 

Unless  the  mnemonic  is  specified  it  is  assumed  to  be  the  first  three 
letters  of  the  operation. 
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V.   Subparameters 


The  letters  A,  B,  C,  etc.,  used  below  refer  to  variable  numbers. 

Mnemonic 

ABORT.  ABO 

ABSOLUTE  VALUE  (A)  =  (B) .  ABS 

ADD  (A^  +  (B)  =  (C).  ADD 

ADJUST  (A)(B).  ADJ 

ANGLE  TO  RADIANS  (A)  =  (B) .  ANG 

ARCCOSINE  (A)  =  (B) .  A-C 

ARCSINE  (A)  =  (B).  A-S 

ARCTANGENT  (A)  =  (B) .  A-T 

CLOSED  WHEN  (A)  *a*  *d*  (B)*g*.   See  section  on  closed  when  C-W 

COMBINE  (A)(B)(C)(D)(E)(F)(G)(H)(I)(J)(K).  COM 

COMPUTED  GO  TO  (AKa-^a^a, a  ).  C-G 

CONSTANTS  (A)*b*.   or  (A) (b) .   or  (A,B)*b**c**d**e*.  CON 

COSINE  (A)  =  (B).  COS 


DIFFERENCE  IF  (A)  - 


*b* 
(B) 


M-irM       M->rtt       Itr?" 


X"  "Y"  "Z".  DIF 


DIVIDE  (AV(B)  =  (C) 


GO  TO  (D). 
GO  TO  "D". 


DIV 


EBCDIC  (A)(B).  EBC 

EXCHANGE  (A)  AND  (B) .  EXC 

EXIT.  EXI 

EXPONENT  BASE  E  (A)  =  (B) .  EXP 

FACTORIAL  (A^  =  (B).  FAC 

FIX  (A)  =  (B).  FIX 

FLOAT  (A)  =  (B).  FLO 
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GO  TO 


(A). 

"A". 


IF  (A)  "X"  "Y"  "Z". 

INPUT  FROM  UNIT  (input  address) (number)  variables. 

ISUM  (A)  THROUGH  (B)  =  (C) . 

LAST  OUTPUT  ON  UNIT  (output  address) (number)  variables. 

LOG  BASE  E  (A)  =  (B) . 

LOG  BASE  10  (A)  =  (B) . 

MAXIMUM  (A)  OR  (B)  =  (c). 

MINIMUM  (A)  OR  (B)  =  (C). 


MODULAR  ARITHMETIC  (A)  MOD 


*m* 
(M) 


(B). 


MOVE  (A)  TO  (B). 

MULTIPLE  COMBINE  (A,A ' ) (B,B ' ) (C, C ' ) (D,D ' ) (E) . 

MULTIPLY  (A)  X  (B^  -  (c) . 

NO  OPERATION. 


ONE  RECODE:   IF  (A)  = 


*b* 
(B) 


THEN  (C) 


*d* 
(D) 


ELSE  *e*. 
ELSE  (E). 


I_  _J 

OPEN  WHEN  (A)  *a*  *d*  (B)*g*.    See  section  on  open  when 
OUTPUT  ON  UNIT  (output  address)  variables  (A,A',IA). 

PERMUTE  (START)  (A)  (B)  (C) (Z)  . 

RADIANS  TO  ANGLE  (A)  =  (B) . 


RAISE  (A) 


*b* 
(B) 


(c). 


Mnemonic 
GO 

IF 

INP 

ISU 

LAS 

ELO 

LOG 

MAX 

MIN 

MOD 

MOV 
M-C 
MUL 
NOO 

ONE 

0-W 
OUT 
PER 
RAD 

RAI 
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RECIPROCAL  (B)  =  (C) 


GO  TO  "symbolic  label". 
GO  TO  (statement  number) 


SEPARATE  (A)(B)(C)(D)(E)(F)(G)(H)(I)(J)(K). 

SIGN  OF  (A)  =  (B). 

SINE  (A)  =  (B). 

SKIP  ON  UNIT  (input  address) (b)  records. 

SPRAY  CONSTANTS  *a,b,i*  (A,A',I  ). 

SQUARE  ROOT  (A)  =  (B) . 

SUBTRACT  (A^  -  (B)  =  (C). 

SUM  (A)  THROUGH  (B)  =  (C). 


TWO  RECODE:   (A^  = 
XADD  (A)  +  (B)  =  (C). 


*b* 
(B) 


AND  (C)  = 


*d* 


(D) 


_J 


THEN  (E)  = 


-x-f* 
(F) 


ESLE  *g*. 
ELSE  (G). 


XDIVIDE  (A)/(B)  =  (C) 


GO  TO  (D). 


XIF  (A)  "X"  "Y"  "Z". 

XMULTIPLY  (A)  X  (B)  =  (C). 

XSUBTRACT  (Ax  -  (B)  =  (C). 
ZAP. 


Mnemonic 

REC 

SEP 
SIG 
SIN 
SKI 
SPR 
SQU 
SUB 
SUM 

TWO 

XAD 

XDI 

XIF 
XMU 
XSU 
ZAP 
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IV.   Subparameter  Description 

The  operations  below  can  be  represented  in  this  basic  form: 
OPX  (A)(B)(C).   • 
where  OPX  is  the  mnemonic  and  A,  B,  C  are  variable  numbers: 
ADD  (A)  +  (B)  =  (C). 
SUBTRACT  (A)  -  (B)  =  (c). 
MULTIPLY  (A)  X  (B)  =  (c). 
SUM  (A)  THROUGH  (B)  =  (C) . 
XADD  (A)    +    (B)  =  (C). 
XSUBTRACT  (A)  -  (B)  =  (c). 
XMULTIPLY  (A)  X  (B)  =  (c) . 
ISUM  (A)  THROUGH  (B)  =  (c). 
MINIMUM  (A)  OR  (B)  =  (c). 
MAXIMUM  (A)  OR  (B)  =  (C) . 


integer  addition  (A)  +  (B)  =  (c) . 
integer  subtraction  (A)  -  (B)  =  (c). 
integer  multiplication  (A)  X  (B)  =  (c) 
integer  summation  (A)  THROUGH  (B)  =  (C). 
var(C)  =  min[var(A),  var(B)] 
var(C)  =  max[var(A),  var(B)] 


Form  for  DO-notation: 

OPX  (A,A')(B,B»  )(c,cf). 
Form  for  DO-notation  with  specified  increments 

opx  (a,a,,ia)(b,b,,ib)(c,c«,ic). 

Basic  form  for  Indirect  Addressing: 
OPX  (2F)(5F)(10F). 


All  of  the  above  operations  can  use  DO-notation  and/or  Indirect 
Addressing  on  any  or  all  of  the  variable  numbers.   For  further  information 
on  either  of  these  options,  see  those  particular  sections. 
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The  operations  below  can  be  represented  in  this  basic  form: 
OPX  (A)(B>). 
where  OPX  is  the  mnemonic  and  A  and  B  are  variable  numbers. 


LOG  BASE  E 

ELO  (A)  =  (B). 

LOG  BASE  10 

LOG  (A)  =  (B). 

SQUARE  ROOT  (A)  =  (B) . 

EXPONENT  BASE  E  (A)  =  (B). 

SINE  (A)  =  (B). 

COSINE  (A)  =  (B). 

ARCSINE 

A-S  (A)  =  (B). 

ARCCOSINE 

A-C  (A)  =  (B). 

ARCTANGENT 

A-T  (A)  =  (B). 

ANGLE  TO  RADIANS 

ANG  (A)  =  (B). 

RADIANS  TO  ANGLE 

RAD  (A)  =  (B). 

ABSOLUTE  VALUE  (A)  =  (B) . 

SIGN  OF  (A)  =  (B). 

MOVE  (A)  TO  (B). 

EXCHANGE  (A)  AND  (B) . 

FIX  (A)  =  (B). 

FLOAT  (A)  =  (B). 

FACTORIAL  (A)  =  (B). 


var(A)  >  0 

var(A)  >  0 

var(A)  >  0 

var(A)  <  173 

var(A)  must  be  in  radians 

var(A)  must  be  in  radians 

-1  <  var(A)  ■:  1 

-1  <  var(A)  <  1 


sign  of  var(A)  times  var(B) 


floating  point  to  integer 
integer  to  floating  point 
var(A)  >  0 
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ADJUST  (A)(B) 


Variable  (A)  must  be  read  in  an  A  1 
format.   The  result  from  ADJUST  should 
be  put  back  into  the  same  variable 
since  TRANSFORMATION  cannot  print  data 
from  an  A  1  format .   After  ADJUST  varial 
(B)  will  contain: 


"X" 


EBCDIC  (A)(B; 


0  to  9  =  0.  to  9. 

X"  punch  =  11. 

Y"  punch  =  12. 
blank  =  -0. 
all  other  codes  =  0. 

Variable  (A)  must  be  read  in  an  A  1 
format.   TABLE  I  indicates  the  values 
that  the  various  codes  will  be  changed 
to. 


Form  for  DO-notation : 

OPX  (A,A')(B,B'). 
Form  for  DO-notation  with  specified  increments: 

OPX  (A,AMA)(B,B',IB). 
Basic  form  for  Indirect  Addressing: 

OPX  (2F)(5F). 
All  of  the  above  operations  can  use  DO-notation  and/or  Indirect  Addressing 


on  any  or  all  of  the  variable  numbers. 


Operations  that  use  labels 


GO  TO 


GO  "symbolic  label". 
GO  (statement  number). 


GO  "symbolic  label",  implies  that  the  next  operation  to  be  executed  is  the 
instruction  with  the  symbolic  label  which  is  the  same  as  the  symbolic  label 
in  this  statement. 

GO  (statement  number),  implies  that  the  next  operation  to  be  executed  is  the 
statement  whose  instruction  number  is  equal  to  the  statement  number. 


COMPUTED  GO  TO 
Basic  forms 


an) . 


C-G  (A)(a1,a2,a3, 

-G  (A)"n1'«"a2"Ma0" 'V'* 

C-G  (A)  (an)"a  ""a  "(ai,  I  (a-). 
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where:   (ai\  1  <   i  <  n  <  2h  ,    are  statement  numbers. 

"a^1,  1  <  i  <  n  <2k,    are  symbolic  labels. 

Variable  (A^  is  a  floating  point  number  of  integral  value  whose  current 
value  is  in  the  range:   1  <  var  (A"*  <   n.   If  variable  (A)  is  not  of  integral 
value,  it  is  truncated,  (all  digits  to  the  right  of  the  decimal  point  are 
dropped) . 

Control  will  be  transferred  to  the  symbolic  label  or  statement  number 
whose  position  in  the  list  (the  a^'s^  is  equal  to  the  integral  value  of 

variable  A,  i.e.,  a    /.\  • 

'     '   var  (A' 

Variable  (A'  can  use  the  F-type  of  Indirect  Addressing. 


DIVIDE  (A>/(B^  =  (C) 


XDIVIDE  (AV(B)  =  (O 


RECIPROCAL  (B)  =  (C) 


GO  TO  "symbolic  label". 
GO  TO  (statement  number') 


GO  TO  "symbolic  label". 
GO  TO  (statement  number) 


GO  TO  "symbolic  label". 
GO  TO  (statement  number) 


J 


Transfer  to  a  "symbolic  label"  or  (statement  number)  occurs  upon  an  attempt 
to  divide  by  zero,  (var  (B)  =0).   If  the  "symbolic  label"  or  (statement 
number)  is  not  specified  and  division  by  zero  is  attempted,  the  job  is 
terminated. 

Basic  forms  of  DIVIDE  and  XDIVIDE: 

OPX  (A)(B)(C). 

OPX  (A)(B)(C)  "symbolic  label". 

OPX  (A)(B)(C)  (statement  number). 

where  OPX  is  the  first  three  letters  of  the  desired  operation. 

Basic  forms  of  RECIPROCAL: 

REC  (B^(C). 

REC  (B)(C)  "symbolic  label". 

REC  (B)(C)  (statement  number). 


Form  for  DO-notation : 


OPX  (A,A')(B,B*)(C,C')  "symbolic  label". 
REC  (B,B')(C,C)  "symbolic  label". 
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Form  for  DO-notation  with  specified  increments: 

OPX  (A,A,,IA)(B,B,,IB)(C,C,,IC)" symbolic  label". 

REC  (B,B',IB)(C,C',IC)" symbolic  label". 
Basic  form  for  Indirect  Addressing: 

OPX  (6F)(8F)(21F)" symbolic  label". 

REC  (8F)(21F)" symbolic  label". 

DO-notation  and/or  Indirect  Addressing  can  be  applied  to  any  of  the 
forms;  these  options  can  be  used  on  any  or  all  of  the  variable  numbers. 


IF 

IF  (A)  (X)(Y)(Z).     where  (x),  (Y),  (Z)  are  statement  numbers 
IF  (A)  "X"  "Y"  "Z".   where  "X",  "Y",  "Z"  are  symbolic  labels 

A  is  some  variable  number.   The  variable  must  contain  a  floating  point 
number. 

The  IF  operation  will  branch  to  (X)  or  "X"  if  A  is  negative;  (y)  or 
"Y"  if  A  is  zero  or  blank;  (z)  or  "Z"  if  A  is  positive.   (See  the  special 
section  on  BLANKS  to  see  how  to  determine  if  variable  A  contains  a  blank.) 

Note  that  any  two  of  the  three  branches  may  branch  to  the  same  place: 
IF  (A)  "X"  "X"  "Z". 

If  A  is  less  than  or  equal  to  zero,  go  to  the  statement  labeled  "X" .   If 
A  is  greater  than  zero,  go  to  the  statement  labeled  "Y" .   Symbolic  labels 
and  statement  numbers  may  appear  as  branches  in  the  same  operation. 

Basic  form  for  DO-notation : 

IF  (A,A')  (X)(Y)(Z). 

If  the  variables  A  through  A'  are  all  negative,  branch  to  (X);  if  all  are 
zero  or  blank,  branch  to  (y);  if  all  are  positive,  branch  to  (z).   If  none 
of  the  conditions  are  met,  the  next  operation  is  executed. 

Basic  form  for  DO-notation  with  specified  increments: 

IF  (A,A',IA)  "X"  "Y"  "Z". 
Basic  form  for  Indirect  Addressing: 

IF  (6F)  (     (Z). 

-he  variab      -renced  by  the  value  in  variable  6  is  negative,  go  to  (3 
ro,  go  to  (  Ltive,  go  to  (z). 
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DO-notation  and/or  Indirect  Addressing  can  be  applied  to  all  IF 
operations;  these  options  can  be  used  on  any  or  all  of  the  variable 
numbers.   Flag  and  DO-notation  can  be  combined. 


DIF 

DIF  (A)   /  v   (X)(Y)(Z).    where  (X),  (y),  (Z)  are  statement  numbers 

DIF  (A)   ?**  "X"  "Y"  "Z".   where  "X",  "Y",  "Z"  are  symbolic  labels 

where  b  is  some  constant  and  B  is  some  variable  number.  A  and  B  must  con- 
tain floating  point  numbers.  Neither  the  variable  B  nor  the  constant  *b* 
should  be  blank  or  negative  zero  (-0.0),  as  they  are  considered  0.   See  the 
special  section  on  BLANKS  to  see  how  to  determine  if  variable  A  is  a  blank. 

The  variable  B  or  constant  *b*  is  subtracted  from  variable  A.   The 
result  then  determines  the  branch.   If  the  result  is  negative,  branch  to 
(X)  or  "X";  if  zero  or  blank,  branch  to  (y)  or  "Y";  if  positive,  branch  to 
(Z)  or  "Z". 

Basic  form  for  DO-notation: 

DIF  (A,A')(B,B')  (X)(Y)(Z). 

If  the  differences  between  (A)  and  (B),  (A+l)  and  (B+l) (A')  and  (b1) 

are  all  negative,  branch  to  (X);  if  all  are  zero,  branch  to  (y);  if  all  are 
positive,  branch  to  (z). 

Basic  form  for  DO-notation  with  specified  increments: 

DIF  (A,A',IA)(B,B',IB)  "X"  "Y"  "Z". 
Basic  form  for  Indirect  Addressing: 

DIF  (5F)(6F)  "X"  "YM  "Z". 

DO-notation  and/or  Indirect  Addressing  can  be  applied  to  any  of  the  basic 
fo  .3. 


XIF 

XIF  (A)  (X)(Y)(Z).    where  (X),  (Y),  (z)  are  statement  numbers 
XIF  (A)  "X"  "Y"  "Z"   where  "X",  "Y",  "Z"  are  symbolic  labels 

XIF  has  the  same  general  syntax  as  IF.   Variable  A  must  be  an  integer 
variable. 


General  Operations: 
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MODULAR  ARITHMETIC 

Variable  A  modulus 


Basic  form: 


MOD  (A) 


*b* 
(C) 


*b* 
(C) 


becomes  variable  D. 


(D) 


Where 


b  is  a  floating  point  number  of  integral  value; 

C  is  a  variable  number  whose  contents  is  a  floating  point 

number  of  integral  value; 

the  brackets,  [],  imply  an  either/or  situation 

If  b  or  the  contents  of  variable  C  is  not  a  floating  point  number  of 
integral  value,  then  the  number  is  terminated  (the  digits  to  the  right 
of  the  decimal  point  are  dropped) . 


RAISE   (A) 


*b* 
(C) 


(D). 


Raise  variable  A  to  the  b~kh  power  (or  to  the  variable  C™  power) 
and  store  the  result  in  D. 

Where :   b  is  a  floating  point  number. 

Basic  form: 


RAI  (A) 


*b* 
(C) 


(D). 


Forms  for  DO-notation: 

OPX  (A,A')*b*(D,D'). 
OFX  (A,A')(C,C')(D>D?). 

Forms  for  DO-notation  with  specified  increments: 

OFX  (A,A',IA)*b*(D,D',ID). 

OFX  (A,A',IA)(C,C',IC)(D,D',ID). 
Form  for  Indirect  Addressing: 

OFX  (7F)*b*(l3F). 

DO-notation  and/or  Indirect  Addressing  can  be  used  on  any  or  all  of 
the  variable  numbers. 
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Data  Manipulative  Operations : 

PERMUTE 

PER  (START)  (A).(B)(C)(D)(E)(F). 

where  1  <  START  <  2000 

As  many  as  (2001  -  START)  variables  can  be  permuted;  this  reordering 
will  begin  at  location  START. 

Basic  form: 

PER  (START)(A)(B)(C). 
PER  (101)  (3)  (*0  (5). 

After  this  operation  was  executed 

variable  number  101  would  contain  contents  of  variable  3 
variable  number  102  would  contain  contents  of  variable  k 
variable  number  103  would  contain  con  ents  of  variable  5« 

Form  for  using  DO-notation: 

PER  (START)(A,A')(B,B')» 
PER  (101) (3; 5). 

This  example  does  the  same  thing  as  the  above  example. 

DO-notation  with  specified  increments: 

PER  (START) (A,A',IA)(B,B',IB)(C,C,,IC). 

Form  for  Indirect  Addressing:   the  only  form  of  Indirect  Addressing 
that  can  be  used  is  the  D-flag. 

PER  (START)(5D,10D)(15D,20D,2). 

PERMUTE  may  be  used  any  number  of  times  within  a  TRANSFORMATION  pro- 
gram. A  PERMUTE  statement  may  contain  any  or  all  of  the  above  different 
forms. 


COMBINE 

COM  (A) (B) (C) (D) (E) (F) (G) (H) (i) (j) (K) (L) . 

where  A  through  L  are  variable  numbers. 

Variable  A  through  K  will  be  concatinated  in  the  above  order  and  the 
new  number  just  formed  will  be  placed  in  variable  L. 
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You  may  specify  from  2  up  to  12  variable  numbers,  noting  that  the 
last  variable  number  specified  will  contain  the  result. 

DO-notation  is  not  allowed  on  this  operation,  however,  DO-notation 
for  this  operation  is  available  as  the  instruction  MULTIPLE  COMBINE. 

Only  one  form  of  Indirect  Addressing  is  permitted: 

COM  (6F)(7F)(12F)(17F)(19F)(100F). 

where  any  or  all  of  the  variable  numbers  may  be  flagged. 


SEPARATE 

SEP  (A) (B) (C)(D) (E) (F) (G) (H) (i) (j) (K) (L) . 

where  contents  of  variable  A  is  the  number  to  be  separated.  Variables  B 
through  F  specify  how  the  number  is  to  be  separated  (group  1);  variables 
G  through  K  specify  where  the  results  are  to  be  put  (group  2).   Group  1 
can  consist  of  1  to  5  numbers,  but  the  empty  set  of  parentheses  must  be 
there.   There  must  be  a  1  to  1  correspondence  between  group  1  and  group  2. 

DO-notation  is  not  allowed. 

Only  F-flag  notation  is  permitted. 

SEP  (6f)(7f)(iof)()<)0(25f)(37F)()()(). 
where  any  or  all  of  the  variables  may  be  flagged. 

MULTIPLE  COMBINE 

M-C  (a,a,)(b,b')(c,c*)(d,d')(e). 

where  A  through  E  are  variable  numbers. 

Variable  E  will  contain  the  result  after  all  the  numbers  have  been 
placed  adjacent  to  each  other  in  the  above  order.   You  may  have  from  one 
to  five  sets  of  parentheses.   The  last  set  of  parentheses  specifies  the 
variable  number  where  the  result  will  be  placed.   The  maximum  size  of  the 
result  is  10°5. 

If  the  contents  of  any  of  the  variable  numbers  have  digits  to  the 
right  of  the  decimal  point  these  digits  will  be  dropped.   Only  digits  to 
the  left  of  the  decimal  point  will  be  used. 

Form  for  DO-notation  with  specified  increments: 

^A',[A)(B,B',IB)(C,C',IC)(D,D',ID)(E). 
The.-     mo  DO-notation  on  the  last  variable. 
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Form  for  Indirect  Addressing: 

M-C  (6F,10F)(9F,13F)(2LF,25F)(19F). 

The  F-flag  is  allowed  on  all  of  the  variable  numbers,  while  the  D-flag 
is  allowed  on  all  of  the  variable  numbers  except  the  last  one. 

SKIP  ON  UNIT 

SKI  (input  address) (B). 

where  the  input  address  is  SEQUENTIAL  1-15  or  CARDS.   B  is  a  variable 
number. 

The  contents  of  B  must  be  an  integral  floating  point  number. 

The  statement  SKIP  implies  that  on  the  specified  unit,  the  number 
of  records  specified  by  variable  B  be  skipped.   (A  record  is  synonymous 
with  a  row  and  an  observation). 

Only  one  form  of  Indirect  Addressing  may  be  used: 

SKI  (S  1)(6F). 

DO-notation  is  not  allowed  on  this  operation. 

INPUT  FROM  UNIT 

INP  (input  address) (number) . 
This  operation  is  yet  to  be  implemented. 

OUTPUT  ON  UNIT 

OUT  (output  address)(A,A',IA). 

where  the  input  address  is  SEQUENTIAL  1-15  and/or  PRINT. 

By  using  the  OUTPUT  instruction  certain  computation  results  at  various 
places  within  the  program  can  be  output  immediately.   If  the  same  output 
address  is  used  more  than  once,  the  number  of  variables  to  be  output  must 
be  the  same. 

In  particular, 

OUT  (S  1)(5,10). 
OUT  (S  V  (105,110)- 
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Recode  Operations: 
ONE  RECODE 

ONE  if  (A)  = 


*b* 
(B) 


then  (C)  = 


*d* 
(D) 


ELSE  *e*. 
ELSE  (E), 


Where:   b,  d,  and  e  are  floating  point  numbers. 
A,  B,  C,  D,  and  E  are  variable  numbers. 
The  brackets,  [  ]  ,    imply  an  either/or 
situation.   (Thus,  there  are  12  combinations 
for  this  recode  operation.) 


Basic  forms : 


ONE  (A)*b*(C)*d*. 
ONE  (A)(B)(c)*d**e*. 


ONE  (A)(B)(C)(D)(E). 

Form  for  DO-notation: 

ONE  (A,A')*b*(C,C')*d*. 

This  statement  checks  to  see  if  variable  A  is  equal  to  the  constant,  b,  and 
if  it  is  variable  C  becomes  the  constant  d;  otherwise  nothing  happens.  Next 
a  check  is  made  to  see  if  variable  (A  +  1)  is  equal  to  the  constant  b.   If 
they  are  equal  variable  (C  +  1)  becomes  the  constant  d;  otherwise  nothing 
happens.   This  process  continues  until  one  of  the  DO's  is  completed. 

Form  for  DO-notation  with  specified  increments: 

ONE  (A,A',IA)(B,B',IB)(C,C',Ic)*d**e*. 

This  statement  checks  to  see  if  variable  A  is  equal  to  variable  B.   If  they 
are  equal  variable  C  becomes  the  constant  d;  otherwise  variable  C  becomes 
the  constant  e.  Next  a  check  is  made  to  see  if  variable  (A  +  I/O  is  equal  to 
variable  (B  +  IR) .   If  they  are  equal  variable  (C  +  I~)  becomes  the  constant 
d;  otherwise  variable  (C  +  Ip)  becomes  the  constant  e. 

Form  for  Indirect  Addressing: 

ONE  (lOOF)*b*(l50F)*d*(l89F). 

DO-notation  and/or  Indirect  Addressing  can  be  applied  to  any  of  the  basii 
forms.   These  options  can  be  used  on  any  or  all  of  the  variable  numbers. 


TWO  RECODE 

TWO  if  (A) 


r  "i 

*b* 
(B) 


and  (C)  = 


*d* 
(D) 


then  (E)  = 


(E) 
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ELSE  *g*, 
ELSE  (G), 


Where:  b,  d,  f,  and  g  are  floating  point  numbers. 

A,  B,  C,  D,  E,  F,  and  G  are  variable  numbers. 

The  brackets,  [  ]  ,    imply  an  either/or  situation. 

(Thus,  there  are  2h   combinations  for  this  recode  operation.) 


Basic  forms 


TWO  (A)*b*(C)*d*(E)*f**g*. 
TWO  (A)(B)(C)*d*(E)*f**g*. 


TWO  (A)(B)(C)(D)(E)(F)(G). 
Form  for  DO-notation: 

TWO  (A,A')*b*(C,C,)*d*(E,E')*f**g*. 
Form  for  DO-notation  with  specified  increments: 

TWO  (A,A',IA)*b*(C,C',Ic)*d*(E,E',IE)*f*. 
Form  for  Indirect  Addressing: 

TWO  (6F)*b*(lOF)*d*(l2F)*f**g*. 

DO-notation  and/or  Indirect  Addressing  can  be  applied  to  any  of  the 
basic  forms.   These  options  can  be  used  on  any  or  all  of  the  variable 
numbers . 
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CLOSED  WHEN 

C-W  (A)*a*  *d*  (B)*g*. 

that  is,  when  *a*  ^_  variable  A  ^_  *d*,  variable  B  =  *g*;  otherwise  nothing 
happens . 

OPEN  WHEN 

0-W  (A)*a*  *d*  (B)*g*. 

that  is  when  *a*  <variable  A  <*d*,  variable  B  =  *g*;  otherwise  nothing 
happens . 

Basic  forms : 

OPX  (A)*a*  *d*  (B)*g*. 

OPX  (A)*a,b,c*  *d,e,f*(B)*g,h,i*. 

Where:   OPX  is  the  mnemonic. 

a,  b,  c,  d,  e,  f,  g,  h,  and  i  are  floating  point  numbers. 

The  statement,  C-W  (A)  *a,b,c*  *d,e,f*(B)*g,h,i*. ,  implies: 
if  a  ^_  variable  A  <_  d  variable  B  becomes  g;  otherwise  if  a  +  c  ^ 

variable  A  <_   d  +  f  variable  B  becomes  g  +  i;  otherwise 

This  process  continues  until  a  ^_  b  or  d  ^_  e  or  g  ^_  h. 

Forms  for  DO-notation: 

OPX  (A,A')*a*  *d*(B,B' )*g*. 

OPX  (A, A' )*a,b,c*  *d,e,f*(B,B? )*g,h,i*. 

Forms  for  Do-notation  with  specified  increments: 

OPX  (A,A',IA)*a*  *d*  (B.B'I  )*g*. 

OPX  (A,A',IA)*a,b,c*  *d,e,f*(B,B',IB)*g,h,i*. 

The  statement,  C-W  (A, A* ,1  )*a,b,c*  *d,e,f*(B,B' slB)*g,h,i*. , 
implies:   variables  A  is  checked  to  see  if  it  is  between  any  of  the  ranges 

(a  to  d,  (a  +  c)  to  (d  +  f), );  variable  B  becomes  the 

corresponding  constant  (g,  g  +  i, )  depending  upon  which  range 

variable  A  was  in.   Variable  (A  +  1^)  is  checked  to  see  if  it  is  between 
any  of  the  ranges  (a  to  d,  (a  +  c)  to  (d  +  f ),....);  variable 
(B  +  Ig)  becomes  the  corresponding  constant.   This  process  continues  until 
any  of  the  DO's  are  completed. 

Forms  for  Indirect  Addressing 

OPX  (6F)*a*  *d*  (20F)*g*. 

OPX  (6F)*a,b,c*  *d,e,f*(20F)*g,h,i*. 
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CONSTANTS 

Basic  forms 


CON  (A)*b*.  where  b  is  a  floating  point  number 

CON  (A)(c).  where  c  is  an  integer 

Forms  for  DO-notation: 

CON  (A,A')*b*. 

CON  (A,A')(c). 

CON  (A,A*)*a1**a2**a3* *an*.        0  <'n  <  k6 

CON  (A, A' ) (ax) (a2) (a3) (aj .        0  <  n  <  k6 

The  statement,  CON  (A,A')*b*.,  causes  the  floating  point  number,  *b*, 
to  be  put  into  variable  numbers  A  through  A' . 

The  statement,  CON  (A, A1 )*a**b**c*. . ,*n*.,    causes  the  floating  point 
numbers,  a-j_  through  a^,  to  be  put  into  variable  numbers  A  through  A' .   This 
statement  is  finished  when  either  the  DO  is  completed  or  the  string  of  con- 
stants is  exhausted. 

Forms  for  DO-notation  with  specified  increments : 

CON  (A,A',I  )*b*. 

CON  (A,A',IA)(c). 

CON  (A,A',IA)*a1^a2**a  *...*an*.       0  <  n  <  k6 

CON  (A,A',IA)(a1)(a2)(a3)...(an).       0  <  n  <  k6 

Forms  for  Indirect  Addressing: 

CON  (6F)*1.5*. 

CON  (6f,12F)*1«2**1-9**2.9**.5*. 

SPRAY  CONSTANTS 

Put  constants  *a*   through  *b*  in  increments  of  *i*  into  variable  numbers 
A  through  A ' . 

Basic  form: 

SPR   *a,b,i*  (A,A')- 

where  a,  b,  and  i  are  floating  point  numbers.   The  increment,   i>  may  be 
omitted  if  i  =  1. 
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Form  for  DO-notation  with  specified  increments: 

SPR  *a,b,i*(A,A'IA). 

Form  for  Indirect  Addressing: 

SPR  *a,b,i*(3F,7F). 
SPR  *a,b,i*(7D,15D). 
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Operations  Involving  No  Parameters : 

ABORT 

The  statement  ABO.  causes  immediate  termination  of  the  TRANSFORMATION 
program  and  all  subsequent  SOUPAC  programs. 


EXIT 

The  statement  EXI.  causes  immediate  termination  of  the  TRANSFORMATION 
program,  but  permits  the  execution  of  all  subsequent  SOUPAC  programs.   All 
data  that  has  been  output  on  sequential  storage  devices  up  to  the  execution 
of  this  statement  is  available  for  use  in  the  subsequent  programs. 


NO  OPERATION 

N00.  is  a  dummy  statement  that  may  be  placed  anywhere  in  the  TRANS- 
FORMATION program  without  affecting  the  sequence  of  execution. 


ZAP 

The  statement  ZAP.  zeroes  out  locations  1001  through  2000, 


TRANSFORMATION 
Page  22 


LAST  CARD  OPERATION 


This  option  allows  instructions  to  be  performed  after  the  last  row 
of  data  has  been  read  in  and  processed.   The  LAST  subparameter  divides 
a  program  into  regular  and  last  card  segments. 

Regular  transformations  are  executed  for  every  row  of  data  read; 
last  card  transformations  are  executed  only  once,  after  all  data  is 
read  and  all  regular  transformations  performed.   Note  that  any  TRANS- 
FORMATION subparameters  may  be  used  in  any  regular  or  last  card  segment 
except  the  subparameter  LAST  (or  LAST  followed  by  its  appropriate 
parameters  -  see  list  of  subparameters).  LAST  signals  the  beginning  of  the 
last  card  segment  and  may  occur  only  once  in  a  transformation  program 
after  all  regular  transformations.   The  user  is  warned  against  transfers 
into  or  out  of  the  last  card  segment.  Also  whatever  values  had  been  in 
variables  1  -  1000  are  zeroed  out  before  the  start  of  the  last  card  oper- 
ations, but  variables  1001  -  2000  retain  whatever  values  had  been  in  them. 

An  example  of  the  use  of  last  card  transformations  is  the  calculation 
of  variable  means  where  it  is  necessary  to  read  all  rows  of  data,  keep 
a  running  sum  of  the  items  to  be  averaged  and  possibly  a  row  count,  then 
after  all  rows  are  processed,  a  last  card  operation  divides  by  the  sample 
size  (row  count') .   The  last  card  segment  would  then  contain  the  divide 
statement  and  any  output  statements  that  are  required.   Note  that  with 
the  last  card  feature  it  is  not  necessary  to  know  ahead  of  time  how 
many  rows  are  to  be  averaged  and  special  data  values  signalling  the  last 
row  are  also  not  necessary.   Consider  the  use  of  LAST  in  the  following 
examples:   Determine  the  maximum  and  minimum  over  each  of  four  columns 
of  data 

TRA  (C^ . 

IF  (1011)  "B"  "A"  "B". 

"A"  CON  (1011)*1*. 

PER  (1001)  (1,^(1, 1^- 

GO  TO  "N". 

"B"  MAX  (1,10  (1001, 100U) (1001, 1004). 

MIN  (1, k) (1005, 1008^ (1005, 1008) . 

"N"  NO  OP. 

LAST. 

OUT  (P)( 1001, 1008). 

END  P 

Calculate  row  and  column  means  over  five  columns  of  data  (print  input, 
row  means,  column  means,  and  number  of  rows^ 

TRA  (C^  (F)(6). 

"on  (6,7)*1**5*. 

ADD  (1,6)  (1001, 1006 Hiooi,  1006). 

SUM  (iH5N(^- 

DTV  (8) (7) (6). 

LAST  (P)(6).      NOTE:   This  PRINT  occurs  after  the  MOVE  operation  below. 

DIV  (1001, 1005)  (1006 Hi,  5). 

MOVE  (1006) (6). 

END  P 
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BLANKS 


Floating  point  data  fields  which  contain  physical  blank  spaces  (i.e., 
nothing  is  punched  in  them)  are  translated  into  -0.0  (negative  zero)  by 
SOUPAC.   This  value,  -0.0,  is  transferred  from  one  SOUPAC  program  to  an- 
other in  the  same  way  as  any  other  number.   Values  of  -0.0  may  be  created  in 
TRANSFORMATION  for  use  in  other  SOUPAC  programs  as  blanks.  Note  that  -0.0 
punched  ona  data  card  is  read  by  SOUPAC,  and  cannot  be  differentiated  from 
a  blank,  which  is  also  a  -0.0. 

The  IF  and  DIF  operations  will  not  distinquish  between  0.0  and  -0.0. 
The  ONE  and  WO  RECODE  operations  will  determine  if  a  variable  is  -0.0  as 
distinguished  from  other  values  including  zero  (0) .  However,  the  RECODE 
operations  will  not  differentiate  between  a  0.0  and  -0.0  if  zeroes  are 
being  checked  for. 

To  determine  if  a  variable  contains  a  blank  (-0.0),  the  following  type 
of  procedure  should  be  followed : 

ONE  (A)*-0*(B)*b*. 

DIF  (B)*b*"X"  "Y"  "X". 

where  b  cannot  be  -0.0  and  A  and  B  can  be  the  same  variable.   If  a  blank 
was  in  variable  A,  the  next  statement  executed  will  be  the  statement  labeled 
"Y" . 


TRANSFORMATION 
Page  2U 


DO-NOTATION 


DO-notation  is  a  facility  provided  in  the  TRANSFORMATION  program  to 
enable  a  user  to  easily  and  compactly  perform  an  operation  on  a  set  of 
variables  instead  of  performing  that  operation  on  each  of  the  desired 
variables  individually.   This  concept  of  DO-notation  corresponds  to  the 
concept  of  FORTRAN  DO-loops. 

For  example,  suppose  one  wanted  to  evaluate  the  function 


n/  a  +  b  -  kac 


for  a  set  of  a 


u 


'i.y 


and  c 


1* 


i  =  1, 


.N. 


Suppose  further  that  the  set 


of  a.j_  was  the  set  of  variables  numbered  1  through  10;  the  bj_  were  variables 
11  through  20;  and  the  set  of  Cj_  were  variables  21  through  30.   A  TRANS- 
FORMATION program  to  evaluate  the  function  for  a^_,  b-|_,  and  c^  could  be 
written  as: 


MUL(1)(1)(500). 

MUL(ll)(ll)(510). 

ADD(500)(510)(520). 

MUL(1)(21)(530). 

con(6oo)*U*. 

mul(530)(6oo)(6oi). 

sub(520)(6oi)(701). 

SQU(701)(801). 


A  squared 

B  squared 

A  squared  +  B  squared 

A  x  C 

k   x  A  x  C 

A  squared  +  B  squared 

square  root  of  above 


-  k   x  A  x  C 


This  same  procedure  would  have  to  be  repeated  nine  more  times  to  evaluate 
the  function  over  the  whole  set  of  values;  this  would  be  a  tedious  card 
punching  task.  DO-notation  allows  one  to  use  the  above  eight  statements  to 
loop  over  the  sets  a^,  bj_,  and  c-  of  variables  for  N  =  10.  Using  DO-notation 
the  program  segment  would  appear  as: 


MUL(1, 10) (1, 10) (500, 509) • 
MUL(ll,20) (11,  20) (510,  519) • 
ADD(500, 509) (510, 519) (520, 529) 
MUL(1,10)(21,30)(530,539). 

con(6oo)*i+*. 

MUL(530, 539) (600) (601,610) . 
SUB (520, 529) (601,610) (701,710) 
SQU(701, 710) (801, 810) . 


Aj_  squared 
Bj_  squared 
A-  squared  +  Bj_  squared 


A. 


xCi 


k   x  A.  x  C, 
a  2  J^.2   - 


+   B^ 


^x  -  Uac 
square  root  of  above 


Statements  are  numbered  for  reference  only.   The  variables  that  are  being 
operated  on  do  not  have  to  be  in  order.   Therefore  in  the  above  example  one 
might  wish  to  operate  on  every  third  variable  in  the  set:   in  which  case  the 
program  segment  would  be  written  as : 


MUL(1,10,2)  (1,10,2)  (500,509,2). 


''701,710,  2)  (801, 810,-")- 
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The  third  number  enclosed  in  each  pair  of  parenthesis  is  the  increment 
(denoting  the  numerical  interval  between  the  variables).   Thus,  one  operates 
on  variable  one  first,  adds  the  increment  two  to  one  and  the  result  (variable 
three)  is  operated  on  next.   Similarly,  one  adds  two  to  three  and  produces 
variables  five,  which  is  the  next  variable.   The  process  continues  until  the 
variable  number  evaluated  is  greater  than  or  equal  to  the  second  number  in 
the  parentheses.   Thus,  the  DO  statement  is  completed. 

In  order  to  postulate  rules  for  the  use  of  DO-notation,  the  following 
notation  conventions  are  adopted:   given  a  pair  of  parenthesis  enclosing 
arguments  of  the  form  (A,A',IA),  a  DO  has  been  formed.  A  is  referred  to 
as  the  initial  value  or  the  starting  point  of  the  DO;  A'  is  called  the 
terminus  of  the  DO;  IA  is  the  increment. 

1.  The  form  of  a  DO  is  (A,A',IA)  where  1  <  A  <  A*  <  2000, 
IA  >  0  and  (A  +  n  x  IA)  •;  2000,  n  is  an  integer  of  the 
value  0  <  n  <:  (A'  -  A)/lA. 

2.  If  the  increment  (lA)  is  not  shown  in  the  DO,  the  increment 
is  assumed  to  be  one  (l) . 

3.  The  general  form  of  a  statement  using  DO-notation  is 

OPX(A,A',IA)(B,B',IB)(C,C*,Ic) (N,N',1N). 

where  0PX  is  an  instruction  mnemonic.   There  are  1  to  N 
operands  (pairs  of  parentheses)  but  not  all  operands  have 
to  be  a  DO.   Statement  6  in  the  program  segment  is  an  example 
of  this  type  of  usage. 

k.      If  more  than  one  DO  is  used  in  an  instruction,  the  increments 
do  not  have  to  be  the  same.   For  example, 

M0V(1,10,3)(11,11+)- 

5-   If  there  are  two  or  more  DO's  of  unequal  looping  (different  n's) 
in  a  statement,  processing  of  the  instruction  stops  as  soon 
as  the  first  DO  is  completed.  A  DO  is  completed  when  n  =  k  in 
A  +  n  x  I.,  where  k  =  (A'  -  A)/lA  and  0  <  n  <;  k.  An  easy  way 
to  determine  which  DO  is  completed  first  is  to: 

a)  First  calculate  the  kj_'s; 

b)  Find  the  minimum  of  k^'s; 

c)  The  subscript  of  the  minimum  k^  determines  which 
DO  is  completed  first. 

For  example,  ADD (l, 10) (2,5) (26, 32) . 

a)  kx  =  (10  -  1)/1  =  9;  k2  =  (5  -  2)/l  =  3;  k3  =  (32  -  26)/l  =  6. 
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b)  min(k-|_,k2,k3)  =  kg  =  3 


c)   Therefore  the  second  DO  (2,5)  will  be  completed  first. 
That  is,  this  instruction  will  be  done  after  variable  h 
has  been  added  to  variable  5  and  the  result  stored  in 
variable  29- 
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SYMBOLIC  LABELS 


There  are  several  programming  concepts,  or  concepts  of  programming 
languages,  implicit  in  the  phrase  "symbolic  labels".  Most  importantly,  a 
label  is  some  alpha-numeric  set  of  characters,  of  finite  length,  which  is 
used  as  a  name  for  a  statement  or  instruction  of  a  programming  language. 
By  this  label  the  statement  can  be  specified  as  the  destination  of  another 
statement  which  transfers  to  a  particular  section  of  one's  program.   A 
"symbolic  label"  is  a  label  which  is  not  necessarily  restricted  to  numbers. 
The  statement  numbers  allowed  in  the  FORTRAN  language  would  not  be  considered 
symbolic  labels,  but  rather  numeric  labels.   Thus,  "$"  or  "A*+7"  or  "#->B" 
could  be  used  as  symbolic  labels,  while  (2)  or  (7)  or  (12)  would  be  numeric 
labels. 

For  SOUPAC  users,  symbolic  labels  can  occur  only  in  the  branching  or 
transfer  instructions  IF,  GO  TO,  DIV,  XDI,  REC,  and  C-G.  For  example, 
instead  of  writing  'GO  TO  (12).'  which  means  "at  this  point  in  the  TRANS- 
FORMATION program  transfer  to  the  statement  which  is  the  12"th  sequential 
subparameter  statement",  one  could  write  'GO  TO  "ALPHA".'   If  the  labeled 
statement  is  the  12~kh  sequential  statement  the  result  will  be  the  same. 
Similarly  we  would  write  'IF  (12)  "Al"  "B2"  "C3".'  for  'IF  (12) (3) (5) (7) • ' 
where  "Al"  is  the  name  of  the  3rc*  sequential  subparameter  statement;  "B2" 
is  the  name  of  the  5^;  and  "C3"  is  the  name  of  the  7^-   Finally,  one 
could  write  'C-G  (37)  "YOU"  "BET"  "YOUR"  "SWEET"  "BIPPY" . '  for  'C-G  (37) 
(25,62,37,24,19).'   Here  the  first  label  "YOU"  is  associated  with  the  25th 
statement  in  the  program,  "BET"  is  associated  with  the  62n^  statement  and 
so  forth. 

The  ease  of  programming  a  TRANSFORMATION  program  is  greatly  increased. 
Suppose,  for  example,  that  we  had  a  10- statement  transformation  program  in 
which  an  IF-test  was  used.   If  it  were  the  case  that  only  sequential  state- 
ment numbers  could  be  used  for  a  branching  reference  in  a  transfer  statement 
the  IF-test  would  necessarily  use  the  sequential  order  of  the  original  10- 
statement  program.   If  it  then  became  necessary  to  add  cards  to  the  program, 
the  transfer  statement  (IF-test)  would  have  to  be  changed  to  correspond  to 
the  new  sequential  order  of  statements.   The  use  of  symbolic  labeling  in  a 
program  avoids  this  inconvenience.   The  original  10- statement  program  could 
be  referenced  using  symbolic  labels  and  no  changes  would  be  necessary  on 
transfer  statements  when  cards  were  added  to  the  program.   This  is  a  situation 
which  occurs  quite  often  and  the  user  is  encouraged  to  use  the  symbolic 
labeling  option  in  order  to  avoid  the  problems  involved. 

To  use  the  symbolic  labels,  one  must  remember  these  rules: 

1)  Labels  can  be  at  most  eight  characters  long. 

2)  Symbolic  labels  can  be  used  only  in  transfer  instructions, 
namely,  IF,  GO  TO  ,  DIV,  XDI,  REC,  and  C-G. 

3)  To  name  a  statement  symbolically  the  form  is  as  follows: 

"LABEL"  STATEMENT 

The  label  is  enclosed  in  quotes.   Thus  '"ALPHA"  ADD  (l)(2)(5).*, 
where  ALPHA  is  the  name  of  the  statement,  adds  variable  1  to 
variable  2  and  stores  the  result  in  variable  5- 
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h)     Any  statement  can  be  named  or  symbolically  labeled. 

5)  To  any  name  there  is  associated  a  unique  statement; 
i.e.,  two  or  more  statements  occuring  in  different 
parts  of  the  program  cannot  have  the  same  name.   The 
same  statement  occuring  in  different  parts  of  the 
program  may  have  different  labels.  For  example, 
suppose  the  2n^  parameter  statement  and  the  lO^h 
parameter  statement  were  both  'ADD  (3)  (*+)  (5F) .  '  . 
Both  of  these  statements  cannot  be  named  "B"  since 
a  'GO  TO  "B" .  '  statement  would  not  know  which  state- 
ment labeled  "B"  to  go  to.   But,  the  second  statement 
could  be  called  "C37"  and  the  tenth  "BETA"  without 
confusion.   It  should  be  remembered  that  'GO  TO  "C37".' 
and  'GO  TO  "BETA".'  do  not  have  the  same  meaning.   The 
first  transfers  to  the  second  statement  and  the  last 
transfers  to  the  tenth  statement,  although  both  "C37" 
and  "BETA"  do  the  same  thing. 

6)  A  label  cannot  be  used  if  it  does  not  occur  as  the 
name  of  a  statement.   It  makes  no  sense  to  have  a 
statement  'GO  TO  "K9" . '  if  there  is  no  statement 
named  "K9" . 

7)  It  makes  no  difference  whether  the  referenced 
statement  is  before  or  after  the  transfer  state- 
ment as  long  as  some  statement  has  such  a  name. 
Also,  one  can  name  a  statement  and  never  reference 
it  by  that  name. 

8)  The  same  name  or  label  can  be  references  in  as  many 
transfer  statements  and  as  often  as  one  wishes. 

9)  Numeric  characters  may  be  used  as  symbolic  labels 
provided  they  are  enclosed  in  quotes.  For  example, 
'GO  TO  "12".'  means  go  to  the  statement  with  the 
label  "12". 

10)  Numeric  characters  enclosed  in  parentheses  are  numeric 
labels.   These  numeric  labels  when  used  in  a  transfer 
instruction  imply  going  to  the  n^h  sequential  sub- 
parameter  card,  where  n  is  this  numeric  character. 

11)  A  label  in  a  transfer  instruction  may  have  the  special 
forms  "*+n"  and"*-n"  when  n  is  an  integer  number. 
This  is  actually  a  form  of  an  absolute  statement  number 

and  does  not  require  that  there  be  a  statement  with  that  label. 
In  fact,  a  statement  with  the  label  "  *+  n"  is  for  all 
practical  purposes  considered  to  be  unlabelled.   If  n 
is  an  integer  the  interpretation  is  as  follows:   if  the 
sign  is  positive:   transfer  to  the  nth  statement  after  the 
one  in  which  this  special  form  occurred. 


Warnings 
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If  the  sign  is  negative:   transfer  to  the  n  ^  statement 
before  the  one  in  which  the  special  form  occurred. 

"*+n"  when  n  is  some  character  other  than  an  integer  number 
is  an  ordinary  label.   For  example  '*+a"  is  an  ordinary 
label  while  "*  +2"  is  a  special  label. 


The  user  should  be  careful  not  to  use  the  special  forms  of  labels  in 
such  a  manner  that  the  instruction  being  transferred  to  is  not  in  the 
program.   One  cannot,  for  example,  expect  to  transfer  to  "*-  5"  from  the 
fourth  subparameter  statement,  nor  can  one  transfer  to  "*  *  3"  from  the 
second  to  last  subparameter  statement.   Finally,  the  special  label 
"*  +  0"   (asterisk  plus  zero)  is  an  error. 
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INDIRECT  ADDRESSING 


Consider  the  statement  ADD  (A)(B)(C).   We  refer  to  A,  B,  and  C  as  vari- 
ables and  interpret  the  statement  to  mean  add  variable  A  to  variable  B  and 
store  the  result  in  variable  C.  We  could  also  think  of  A,  B,  and  C  as 
locations  in  the  program  or  in  the  machine,  then  we  would  interpret  it  to 
mean  add  the  contents  of  A  to  the  contents  of  B  and  store  the  result  in  C. 
The  latter  interpretation  is  more  descriptive  of  actual  machine  operation. 
For  example  if  we  wish  to  add  gross  income  (A)  and  consulting  fees  (B)  to 
get  total  income  (C)  we  would  actually  add  the  contents  of  a  location  called 
gross  income  to  the  contents  of  a  location  called  consulting  fees,  that  is 
the  dollar  amounts  referred  to  by  these  location  names. 

Referring  to  our  example,  ADD  (A)(B)(C).,  we  may  know  immediately  what 
A,  B,  and  C  are,  for  example  ADD  (10)  (15 )  (20) .,  or  we  may  wish  the  location 
numbers  themselves  to  be  calculated  by  the  program.   The  function  of  Indirect 
Addressing  is  to  enable  us  to  vary  the  location  of  a  value  to  be  used.   For 
example  if  gross  income  is  variable  10  and  variable  5  has  the  value  10,  we 
could  say  add  the  variable  which  has  the  location  indicated  by  variable  5  to 
the  other.   Obviously  variable  5  can  be  the  changed  (e.g.  incremented)  by' 
subsequent  program  but  the  form  of  the  add  need  not.   (As  subsequent  rows 
of  data  are  processed  [or  in  a  loop],  different  values  of  variable  5  may  be 
used. ) 

The  simplest  form  of  Indirect  Addressing  is  called  F-flag;  (5F)  or  (100F) 
are  examples  of  the  F-flag.   The  F  in  (5F)  indicates  that  the  location  to  be 
used  in  the  current  operation  is  the  value  stored  in  location  5«   ADD  (5F)(l5i 
(20).  means  get  the  number  in  variable  5  and  use  that  as  the  location  of  the 
variable  to  be  added  to  variable  15  and  stored  in  variable  20.   In  paragraph 
2,  variable  5  had  the  value  10. 

Hereafter  we  will  use  the  following  notation:   ADD  (5) (7) (23).  means 
c(5)  +  c(7)  =  c(23)  where  c(A)  is  taken  to  mean  the  contents  of  A.   Clearly 
this  amounts  to  adding  variable  5  to  7  and  storing  the  result  in  23. 
ADD  (5)  (7F)  (23).  is  then  c(5)  +  c(c(7))  =  c(23).  Notice  that  the  evaluation 
of  nested  parentheses  is  from  the  innermost  to  the  outermost  pair  of  paren- 
theses.  The  value  of  the  innermost  pair  of  parentheses  must  be  some  variable 
number  and  computer  convention  dictates  that  this  innermost  number  be  in  intec 
mode.  Note  that  fractions  and  mixed  numbers  are  not  possible  in  this  mode. 

DO-notation  and  F-flag  can  be  used  together.   Thus,  ADD  (100F,105F)  (200F, 
205F) (300F, 305F) .  is  interpreted  as  follows:   the  range  of  each  DO  is  6, 
namely  100  to  105,  200  to  205,  and  300  to  305.  But  each  of  these  variables 
in  the  range  of  each  DO  is  F- flagged.   Thus,  variables  101  through  10^,  ?01 
through  20U,  and  301  through  30U  must  have  integral  values,  not  only  100,  105i 
200,  205,  300,  and  305- 

We  illustrate  by  these  examples: 

ADD  (100,105) (200,205) (300,305). 
c(l00)  +  c(200)  -  c(300). 
c(101)  4  c(20l)  =  c(30l). 


c(l05)  +  c(205)  -  c(305). 
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But,  ADD  (l0OF,105F)(20OF,2O5F)(30OF,305F).   means 
c(c(lOO))  +  c(c(200))  =  c(c(300)). 
c(c(lOl))  +  c(c(20l))  =  c(c(30l)). 


c(c(105))  +  c(c(205))  =  c(c(305)). 

The  important  thing  to  notice  in  the  above  example  is  that  the  ranges  of 
the  DO's  are  determined  by  the  range  of  the  variable  numbers,  whether  the 
variable  numbers  are  flagged  or  not.  Also  any  integral  values,  k,  are 
legal,  1  <  k  <   2000,  as  values  of  flagging  variables. 

The  other  form  of  Indirect  Addressing  is  the  D-flag.   The  D-flag  is 
similar  to  the  F-flag  with  this  difference:   only  the  end  points  of  the 
DO-notation  are  used  and  need  be  set  as  flagging  variables.   Consider  the 
example  SUB  (100D,23D)  (1+7D,1001D)  (1D,50D) .   First,  the  listed  variable 
numbers  do  not  determine  the  range  of  the  DO;  the  range  of  the  DO  is 
determined  by  the  construction  (c(lOO),  c(23)),  (c(l+7),  c(l00l))  and 
(c(l),  c(50))  where,  if  c(lOO)  =  1,  c(23)  =  5,  c(kj)   =   6,  c(l00l)  =  10, 
c(l)  =  11,  and  c(50)  =  15,  this  would  amount  to:   SUB  (l,  5)  (6,10)  (11,15)  • 
Secondly,  notice  that  the  variables  in  the  range  of  the  DO  do  not  need  to 
be  flagged  in  any  way.   Finally  the  F-flag  with  DO-notation  produces  a  DO 
on  a  set  of  flagged  variables  while  the  D-flag  is  an  ordinary  DO  determined 
by  indirectly  addressed  initial  value  and  final  value. 

Consider  the  following: 

let     c(l)  =  2  c(5)  =  29  c(9)  =  62  c(l3)  =  77 

c(2)  =  5  c(6)  =  U7  c(10)  =  68  c(lU)  =  83 

c(3)  =  7  c(7)  =  51  c(ll)  =  71  c(15)  =  9k 

c(U)  =  13  c(8)  =  55  c(12)  =  75 

Then  the  statement  ADD  (IF, 5F) (6f,10F) (11F,15F) .  would  decompose  into  the 
following : 

ADD  (2)  (1+7)  (71). 
ADD  (5) (51) (75). 
ADD  (7) (55) (77). 
ADD  (13)(62)(83). 
ADD  (29)  (68)  (9*0. 

But  ADD  (1D,5D)(6D,10D)(11D,15D).  amounts  to:   ADD  (2,  29)  (1+7,68)  (71,  9*0  • 
which  is  to  be  interpreted  as  ordinary  DO-notation. 

Notice  that  ADD  (5F)  (7F)(20F) .  is  the  same  thing  as  ADD  (5D)  (7D)  (20D) .  , 
that  is,  the  F-flag  and  D-flag  are  the  same  when  used  on  a  single  variable. 
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CHARACTER 

0 
1 
2 

3 
k 

5 
6 
7 
8 

9 
blank 
A 
B 
C 
D 
E 
F 
G 
H 
I 
J 
K 
L 
M 
N 
0 
P 

Q 

R 
S 
T 
U 
V 


TABLE  I 

(A) 
CARD  CODE 

0 
1 
2 
3 

h 

5 
6 

7 
8 

9 
blank 
12  -  1 
12-2 
12  -  3 
12  -  k 
12-5 
12-6 
12-7 
12-8 
12-9 

11  -1 

11-2 
11-3 

11  -k 

11-5 
11-6 

11-7 
11-8 

11-9 
0-2 

0-3 
0-k 

0-5 

0-6 


(B) 

EBC  CODE 

0. 
1. 
2. 

3- 
k. 
5- 
6. 
7- 
8. 

9- 
-o. 
11. 

12. 

13. 
ll+ . 
15- 
16. 

17- 
18. 

19- 
20. 
21. 
22. 
23- 
2k. 
25- 
26. 

27- 
28. 

29- 
30. 

31. 
32. 
33- 


(A) 

£A< 

:ter 

CARD  CODE 

X 

0-7 

Y 

0-8 

Z 

i 

0-9 
8-5 

n 

8-7 

12-8-3 

y 

0-8-3 

i 

11-8-6 

: 

8-2 

5 

0-8-7 

i 

11-8-2 

+ 

12-8-6 

- 

11 

( 

12-8-5 

) 

11-8-5 

: 

11-8-3 

* 

11-8-4 

= 

8-6 

/ 

0-1 

# 

8-3 

t 

0-8-1* 

1 

(vertical  bar) 

12-8-7 

i 

12-8-2 

< 

12-8-4 

> 

0-8-6 

1 

(logical  bar^ 

11-8-7 

@ 

8-4 

& 

12 

(underscore) 

0-8-5 
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(B) 
EBC  CODE 

34. 

35- 
36. 

37- 
38. 

39- 
40. 
4l. 

42. 
43. 
44. 

45. 
46. 

47- 
48. 

49- 
50. 

51. 
52. 
53. 
54. 
55- 
56. 
57. 
58. 
59- 
60. 
6l. 
62. 
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- T-TEST 

General  Description 

The  T-TEST  program  calculates  a  T  coefficient  or  F  ratio  as  described 
below: 

Suboperation  (l):   Paired  T-Test  (also  called  correlated  T-Test). 
Variables  are  in  a  row,  variable  on  is  paired  with  variable  two,  three  with 
four,  etc.,  and  a  paired  T  coefficient  is  calculated  for  each  pair  as 
follows : 

t  =  d/s_ 
d 
_   N  th 

d  =  Z   [Xa  -  XB]/N.  ,  where  a  and  B  are  the  j   pair  of  variables 

1=1   NJ  N  lo 

S_  =  N   I        [Xo  -  XB]2  -   (I   [Xa  -  XB])2/N.2.f]  ' 

d    J  i=l  i=l  J 

where  f  =  degrees  of  freedom  =  N.  or  N  -1  as  desired  and  N  is  the  sample 

J     J  J 

+■  v» 
size  for  the  j   pair  of  variables. 

Suboperation  (2):   Paired  T-Test  for  all  possible  combinations  of 
variables  computed  as  in  Suboperation  (l). 

Suboperation  (3):   Test  of  differences  from  a  known  population  mean. 
A  population  mean  must  be  provided  for  each  column  of  data  or  the 
mean  will  be  set  to  zero.   Population  means  should  be  provided  as  a 
row  vector.   The  following  are  calculated  and  printed  for  each  vari- 
able : 

t  value:   t  =  (X  -u  )/S_ 

x 
where  u  =  parameterized  value  or  zero 

_   N- 

Mean:   X  =  Z1  X./N. 

<-1   1  X  ?         2 

1  ±  N.ZX7  -  (EX. ) 

Standard  Deviation:   S.D.  =( ) 

N.x(d.f. ). 
11 

NOTE:   N-l  is  the  usual  degrees  of  freedom,  but  N  may  be  specified. 

S.D. 

Standard  Error  of  Mean:      S_  =    — — -^^_ 

x    /  N 

Suboperation  jk):      Test  of  difference^  from  a  known  population  mean  for 
previously  analyzed  data:   the  mean  (X),  standard  deviation  (S.  D.),  and 
sample  size  (N)  as  well  as  the  population  mean  [see  suboperation  (3)]  are 
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read  in  and  the  program  computes  the  standard  error  of  the  mean  and  T 
for  each  trio  of  the  X,  S.D.,  and  N  read  in  that  order.   If  R  trios  of 
data  are  used  (R  observations  on  rows  occur")  then  R  population  means 
should  be  given.   Calculations  are  the  same  as  in  Suboperation  (3). 

NOTE:  A  column  of  data  of  Suboperation  (3)  is  reduced  to  a  three  item 
row  here. 

Suboperation  (5) :   Test  of  differences  between  two  or  more  group  means 
taken  pairwise:   each   group  is  located  on  a  separate  storage  location 
or  set  of  cards.   The  following  are  calculated: 

t  value:   t  =  (X.  -  X.)/s 

1    j   -    ~ 
Xi  -  x. 

-    Ni    / 
Mean:  X.  -  Z  X -;/!¥,•  AT 

1    n=l  o    N     p 

d  Nj^Xj  -  (z  X±) 

Standard  Deviation:   S.D.  =  ( = )  ' 

N.x(d.f.). 

N        N.         N.       N. 
t^Z  X2  -  (Z1  X2)    N^Z0  X2  -  (z°  X2) 

Pooled  Estimate  of  Variance:   S2  =  [ — — — — +  — ihi J=1  ](d,: 

N.  N . 

1  3 

S2(N.  +  N  )  l/2 

Estimate  of  Standard  Error:   S       =  [ rr-^ — ]  ' 

xi  -  x .       i  j 

Suboperation  (6) :   One-way  analysis  of  variance.   The  data  to  be  compared 
are  located  on  different  storage  units  or  sets  of  cards.   Calculations  are 
made  as  follows  for  each  variable : 

S  =  number  of  storage  units  =  number  of  subroups 

th 
N.  =  number  of  observations  in  i   subgroup,  i=l, .,  S,  and  for 

each  1,    3=1, ,   N-^ 

X.  .  =  element  in  the  j    row  of  the  i   subgroup 

S 
N  =  £  N.  =  total  observations 
1=1  X 

S   Nt  S    Ni 

Total  SS  =  Z  (z  X. .  )  -  (z    Z   X±i)  /N 
i=l  3=1  1J      i=l  j=l 

I      X.j  =  sum  of  each  constant  i  (i.e.  over  the  subgroup) 

.J    1 


S   N      2  S    N       2/N 

Between  SS  =[  Z     I   X.    ]/N.   -  (Z   I       X  )  ' 

i=l  j=l   J  i=l  J=l  lJ 

Within  SS  =  Total  SS  -  Between  SS 
Within  D.F.  =  Total  D.  F.  -  Between  D.  F. 


BSS 

BDF 
F  = 


WS:r 
WDF 


Suboperation  (7)  :   One  way  analysis  of  covariance.   The  experimental 

(dependent)  variable  comes  first  followed  by  up  to  h$   covariates .  The 

dependent  variables  is  adjusted  to  the  set  of  covariates,  not  iteratively 

to  one  covariate  at  a  time.   Subgroups  or  factor  levels  are  handled  as 

in  analysis  of  variance,  Suboperation  (6),  i.e.  as  separate  input  decks 

or  temporary  storage  locations.   Coefficients  obtained  are  means  and  standard 

deviations,  test  of  homogenity  of  regression,  F-ratio  for  covariance, 

and  adjustment  coefficients.   For  further  discussion  see  Winer,  p.  578. ff. 

References 

Bryant,  E.  C. ,  Statistical  Analysis.   New  York:   McGraw-Hill,  i960. 

Snedecor,  G.  W. ,  Statistical  Methods.   Ames:   Iowa  State  College 
Press,  1957. 

Winer,  B.  J.,  Statistical  Principles  in  Experimental  Design. 
New  York:   McGraw-Hill,  1962. 

II.   Restrictions 

The  maximum  number  of  input  variables  is  150  except  for  Suboperation 
(2),  all  possible  paris,  and  Suboperation  (7)»  covariance,  where  the 
maximum  is  50.   Suboperations  (5)  and  (6),  tests  of  differences  and  analysis 
of  variance,  require  the  data  to  be  divided  into  2  or  more  subgroups.   The 
maximum  number  of  subgroups  is  lU. 

I.   Parameters 


T-TEST  and  ANALYSIS  OF  VARIANCE 

Parameter 
Number  Use  or  Meaning 

1  Suboperations  1-7-   (See  above). 

2  Number  of  subgroups  (if  applicable). 


Parameter 
Number  Use  or  Meaning 

3  0  -  Count  clanks  as  zeros 

1  -  Count  "blanks  as  missing  data 

h  0  -  use  N-l  as  degrees  of  freedom 

1  -  use  N  as  degrees  of  freedom  (See  Special 
Comments ) . 

Input  addresses  of  subgroups  for  options  55  6,  and  7  are  listed  on  a 
$-INP  card.   (See  section  on  SOUPAC  Input /Output ) .   The  maximum  number  of 
subgroups  is  lU.   See  example  for  illustration.   If  options  1  or  2  are 
used,  provide  only  1  input  address.   For  option  3,  provide  tv6  addresses, 
the  first  for  the  sample  being  tested,  the  second  for  the  criterion  means. 
For  option  h   provide  two  input  addresses,  the  first  for  the  sample  being 
tested,  the  second  for  the  criterion  means,  standard  deviations  and  sample 
size. 

IV.   Special  Comments 

If  it  is  desired  to  use  the  option  of  comparing  the  mean  with  some 
population  mean  other  than  zero  as  in  Suboperations  (3)  and  {h) ,    a  row 
vector  of  y  must  be  included  in  temporary  storage.   This  row  vector  must 
be  of  length  N  =  number  of  variables. 

Most  work  requires  N-l  degrees  of  freedom. 


V.   Examples 

T-TEST  (l). 
$INP(S1). 

Paired  T-Test  on  data  stored  on  SI. 

T-TEST  (2)(  )(l)(l). 
$INP(S15). 

Paired  T-Test  on  data  stored  on  S15  doing  test  on  all  possible  pairs 
checking  for  missing  data  and  using  N   degrees  of  freedom. 

T-TEST  (3) 
$INP(S1)(S3). 

T-Test  of  population  means  on  SI  against  criterion  means  on  S3. 

T-TEST  (5)  (2). 
$INP(Sl)(S2) 

T-Test  of  group  mean  of  two  groups,  one  on  SI  and  the  other  on 


S  N  S   N         , 

Between  SS  =[  Z      Z        X.    ]/N.   -   (Z   £   X   )  ' 

1-1  j=i    J       x        1-1  a-i  lj 

Within  SS  =  Total  SS  -  Between  SS 

Within  D.F.  =  Total  D.  F.  -  Between  D.  F. 


F  = 


BSS 

V,T  F 


Suboperation  (7)  '•      One  way  analysis  of  covariance.   The  experimental 

(dependent)  variable  comes   first  followed  by  up  to  U9  covariates.   The 

dependent  variables  is  adjusted  to  the  set  of  covariates,  not  iteratively 

to  one  covariate  at  a  time.   Subgroups  or  factor  levels  are  handled  as 

in  analysis  of  variance,  Suboperation  (6),  i.e.  as  separate  input  decks 

or  temporary  storage  locations.   Coefficients  obtained  are  means  and  standard 

deviations,  test  of  homogenity  of  regression,  F-ratio  for  covariance, 

and  adjustment  coefficients.   For  further  discussion  see  Winer,  p.  578. ff. 

References 

Bryant,  E.  C. ,  Statistical  Analysis.   New  York:   McGraw-Hill,  i960. 

Snedecor,  G.  W. ,  Statistical  Methods.   Ames:   Iowa  State  College 
Press,  1957. 

Winer,  B.  J.,  Statistical  Principles  in  Experimental  Design. 
New  York:   McGraw-Hill,  1962. 

Restrictions 


The  maximum  number  of  input  variables  is  150  except  for  Suboperation 
(2),  all  possible  paris,  and  Suboperation  (7),  covariance,  where  the 
maximum  is  50.   Suboperations  (5)  and  (6),  tests  of  differences  and  analysis 
of  variance,  require  the  data  to  be  divided  into  2  or  more  subgroups.   The 
maximum  number  of  subgroups  is  lU. 

Parameters 


T-TEST  and  ANALYSIS  OF  VARIANCE 

Parameter 
Number 


1 
2 


Use  or  Meaning 

Suboperations  1-7-   (See  above). 

Number  of  subgroups  (if  applicable). 


Parameter 
Number  Use  or  Meaning 

3  0  -  Count  clanks  as  zeros 

1  -  Count  blanks  as  missing  data 

k  0  -  use  N-l  as  degrees  of  freedom 

1  -  use  N  as  degrees  of  freedom  (See  Special 
Comments ) . 

Input  addresses  of  subgroups  for  options  5,  6,  and  7  are  listed  on  a 
$-INP  card.   (See  section  on  SOUPAC  Input /Output ) .   The  maximum  number  of 
subgroups  is  ik.      See  example  for  illustration.   If  options  1  or  2  are 
used,  provide  only  1  input  address.   For  option  3,  provide  tv6  addresses, 
the  first  for  the  sample  being  tested,  the  second  for  the  criterion  means. 
For  option  h   provide  two  input  addresses,  the  first  for  the  sample  being 
tested,  the  second  for  the  criterion  means,  standard  deviations  and  sample 
size. 

IV.   Special  Comments 

If  it  is  desired  to  use  the  option  of  comparing  the  mean  with  some 
population  mean  other  than  zero  as  in  Suboperations  (3)  and  (h) ,    a  row 
vector  of  y  must  be  included  in  temporary  storage.   This  row  vector  must 
be  of  length  N  =  number  of  variables. 

Most  work  requires  N-l  degrees  of  freedom. 


V.   Examples 

T-TEST  (1). 
$INP(S1). 

Paired  T-Test  on  data  stored  on  SI. 

T-TEST  (2)(  )(l)(l). 
$INP(S15). 

Paired  T-Test  on  data  stored  on  S15  doing  test  on  all  possible  pairs 
checking  for  missing  data  and  using  N  degrees  of  freedom. 

T-TEST  (3) 
$INP(S1)(S3). 

T-Test  of  population  means  on  SI  against  criterion  means  on  S3. 

T-TEST  (5)  (2). 
$INP(S1)(S2) 

T-Test  of  group  mean  of  two  groups,  one  on  SI  and  the  other  on 


T-TEST  (6)  (k)    (1). 
$INP(S1)(C)(C)(S3). 

One-way  analysis  of  variance  over  four  groups,  one  on  SI,  two  from 
cards,  and  one  from  S3,  checking  for  missing  data. 


UNRESTRICTED  MAXIMUM  LIKELIHOOD  FACTOR  ANALYSIS 

Parameter 
Number  Use  or  Meaning 

1  Input  Address  for  correlation  matrix. 

SEQUENTIAL  1-15;  CARDS  are  not  permitted 

2  Output  Address  for  final  unrotated  factor  matrix. 

SEQUENTIAL  1-15-   See  also  Parameter  13. 

3  Input  Address  for  row  vector  of  initial 

estimate  of  uniqueness.   CARDS,  SEQUENTIAL 
1-15  (optional). 

k  Lover  bound  for  number  of  factors. 

5  Upper  bound  for  number  of  factors. 

6  Sample  size  (number  of  observations)  on 

which  correlation  matrix  is  based. 

7  Maximum  number  of  iterations. 

8  Probability  of  chance  occurance,  i.e.,  *1.00*. 

9  1  to  print  input  correlation  matrix  and  partial 

correlation  matrices  after  any  variables  have 
been  removed. 

10  1  to  print  technical  output. 

11  1  to  print  intermediate  results. 

12  1  to  punch  unrotated  factor  matrices. 

13  1  to  apply  a  varimax  rotation  to  all  factor 

matrices.   If  this  parameter  is  used  the  output 
of  parameter  2  will  be  a  rotated  factor  matrix. 

This  program  has  been  taken  directly  from  Jo'reskog  (1967)  with  his  per- 
mission.  Anyone  interested  in  the  methods  is  referred  to  the  references  listed 
below.   The  program  is  temporarily  limited  to  75  variables  and  30  factors. 
Parameters  1,  U,  5>  6,  7,  and  8  are  required.   Parameter  8  must  be  enclosed 
within  asterisks,  **,  and  must  have  a  punched  decimal  point. 

References : 

Jo'reskog,  K.  G.  UMLFA  -  a  computer  program  for  unrestricted  maximum  likelihood 
factor  analysis.   Research  Memorandum  66-20.   Princeton,  New  Jersey: 
Educational  Testing  Service.   Revised  Edition,  1967 • 

Jo'reskog,  K.  G.   Some  contributions  to  maximum  likelihood  factor  analysis. 
Psychometrika,  1967,  32,  UU3-U82. 


VARIMAX  FACTOR  ROTATION 


General  Description 

VARIMAX  ROTATION  is  used  to  redistribute  a  factor  matrix  (principal 
axis,  centroid,  etc.")  variance  so  that  the  matrix  approches  orthogonal 
simple  structure.   The  varimax  scheme  maximizes  the  following  criterion 
function: 

L   (hz(a,    *^(j)2)2  -  (z(a,    ^(jf))2) 

where  j  is  the  variable  index  number:   1, .,  n 

s  is  the  factor  index  number:   1, ,  f 

a/ .   \  is  the  factor  loading  of  the  j^h  variable  on  the  s   factor 

hj   is  the  jth  variable  communality 

For  further  discussion  see: 

H.F.  Kaiser,  "Computer  Program  for  Varimax  Rotation  in  Factor 
Analysis",  Educational  and  Psychological  Measurement,  Vol.  XIX, 
Nov.  3,  1959,  pp. 413-^207" 

Cooley  and  Lohnes,  Multivariate  Procedures  for  the  Behavioral 
Sciences,  New  York,  John  Wiley  and  Sons,  Inc.,  1962,  pp.l6l-3- 

Restrictions 

The  input  matrix  for  VARIMAX  ROTATION  must  not  exceed  190  variables 
and  190  factors.   The  number  of  factors  may  be  anything  greater  than  or 
equal  to  2.   Any  factor  matrix  generated  by  a  statistical  system  factor 
analysis  program  is  acceptable  input.  A  matrix  may  also  be  entered  from 
cards. 

Parameters 

The  parameters  for  the  VARIMAX  ROTATION  appear  on  the  program  call 
card.   They  must  follow  the  program  name  in  this  order: 

Parameter 
Number  Use  or  Meaning 

1  Input  Address.   CARDS  or  SEQUENTIAL  1-15 . 

2  Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

3  The  presence  of  a  number  greater  than  0  in  this 
parameter  indicated  the  communalities  should  be 
printed. 

U  0  or  blank  for  normal  VARIMAX.   1  if  raw  VARIMAX 

is  desired. 

SOUPAC  (Statistically  Oriented  Users  Programming  and  Consulting) 
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APPENDIX  A 
Using  Owner  Storage  Devices  With  SOUPAC 
Introduction 

User  data  storage  mediums  for  our  360  user  will  typically  be  dismountable 
tape  or  disk  packs  or  they  will  be  user  data  sets  on  some  dismountable  or 
permanently  mounted  disk  pack.  Whatever  the  source,  it  must  be  described  fully 
so  that  the  machine  can  access  it.  This  information  is  needed  as  arguments 
to  certain  JCL  keywords.  These  parameters  will  be  treated  in  a  moment.  First 
ve  want  to  consider  the  dismountable  devices,  tapes  and  disk  packs.  User  disk 
packs  are  not  as  frequent  as  user  tapes  so  we  shall  ignore  them  for  the  moment. 

re  are  several  "physical"  characteristics  of  tapes  which  pertain  to  recording 
techniques  which  must  be  considered  before  we  get  into  the  "logical"  attributes 
iribed  by  the  JCL  parameters  in  the  next  section. 
Physical  tape  parameters  are  density,  number  of  tracks  written  and  recording 
mode.  Density  can  be  200,  556,  800,  or  1600  bits  per  inch,  abbreviated  BPI.   Thus 
a  tape  written  at  800  BPI  can  store  less  information  per  inch  than  one  written  at 

)  BPI.   Note  that  a  tape  to  be  used  on  the  360  must  be  written  at  1600  BPI.   It 
is  true  that  we  have  a  dual  density  tape  drive,  namely  800/1600  BPI  accessible  to 
the  360  and  that  there  is  a  density  parameter  in  JCL  enabling  one  to  specify  the 
density  but  since  the  1600  mode  is  so  much  faster,  generally,  to  be  used  on  the  360, 
the  tape  should  be  at  1600  BPI. 

The  number  of  tracks  used  in  recording  is  7  or  9-      The  360  usually  uses  a  9 
rack  recording  mode;  the  709U  used  the  7  track  mode.   Recording  mode  is  unformatted 
'  formatted  but  we  usually  speak  of  binary  tapes  or  card  image  tapes,  respectively. 
If  it  is  a  card  image  tape  it  is  BCD  or  EBCDIC. 

Thus,  the  most  desirable  input  tape  for  the  360  is  1600  BPI,  9  track,  360  binary 
sr  EBCDIC.   If  one  or  mere  of  these  characteristics  is  different,  then  it  should  be 
changed  by  doing  a  ts.pe  to  tape  copy  with  the  appropriate  conversion.   Asking  for 


-2r 

special  exceptions  is  not  advised;  i.e.  do  not  try  getting  permission  to  v.rrite 
800  BPI  except  in  vo.ry   special  circumstances. 

At  tills  installation  we  can  do  the  following  conversions: 

... 

556  BPI  to  800  BPI  to  l600  BPI  and  vice  versa 
7  track  to  9  track  and  vi  ce  versa 
BCD  to  EBCDIC  and  vice  versa 

Note  that  in  going  from  EBCDIC  to  BCD  some  characters  may  be  lost.   These  conver- 

sions  are  at  the  moment  done  "by  using  the  appropriate  IBM-supplied  utility  program, 
see  a  consultant  to  help  you  use  the  utilities. 

Y7e  can  also  do  3^0  binary  to  EBCDIC  conversion  and  vice  versa  but  v.re  do  not 
have  the  facility  to  do  anything  about  a  709'1  binary  tape.  Not  that  it  cannot  be 
done,  but  of  the  two  alternative  means  we  know  of,  one  uses  IBM  conversion  routin 
which  would  require  hardware  changes;  the  other  requires  a  non-trivial  program 
which  we  do  not  have  and  even  when  we  get  it,  it  will  be  slow  and  clumsy. 

\le   repeat  that  there  are  JCL  keywords  to  specify  density,  number  of  tracks, 
and  parity  but  it  is  too  slow  and  clumsy  for  the  3^0/75 • 

Clearly  the  greater  the  density,  the  more  information  can  be  stored  on  a  taj 
and  the  faster  it  can  be  read  or  written.   The  higher  density  does  however  also 
require  a  better  tape.   One  should  not  write  1600  BPI  on  a  tape  tested  at  800  B3 
Further  only  a  1600  BPI  tested  tape  can  be  reliably  written  with  9  tracks.   Thi 
test  BPI  information  is  usually  on  the  tape  reel  in  the  form  of  a  comment  sayingll 
tested  at  1600  BPI  or  800  BPI.   Note  that  1600  BPI  and  3?00  ECI  mean  the  same  th:tg 

Thei'e  arc  several  things  to  keep  in  mind  when  writing  a  user  tape  in  binary 
or  c^r-O    image  mode.   Card  image,  line  image,  and  formatted  tape  all  mean  essentil 
the  r«    rid  mean  exactly  what  their-  names  imply.  Clearly  all  these  are  read  an 
Irfritl     Lth  a  format.  Binary  and  unformatted  tapes  read  and  write  faster  than 

Led  tapes  but  may  take  up  more  space  than  formatted  tapes.   Now  much  data, 
a  urcj  •..•'':  al  [b3c  and  how  important  time  is  will  probably  determine 

t  tape  may  be  read  many  different  ways  using  form; 
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to  pick  off  variables.  An  unformatted  tape  has  variables  immediately  usable  but 

haps  not  everything  the  user  has  on  his  cards.  He  can  put  everything  he  rnay 
ever  want  to  use  on  his  unformatted  tape  but  he  may  also  be  dragging  some  sup 
fluous  variables  around  evcrytirr.e  he  docs  a  read. 

Choi ce,  though  possible,  is  typically  not  available  for  number  of  tracks  on 
tapes  written  by  the  360.  You  get  9  track  tapes. 

So,  the  usual  36 0  output  tape  is  1600  BPI,  9  track,  formatted  or  unformatted. 

SOUPAC  writes  1600  BP1,  9  track,  unformatted  tapes.   To  get  a  1600  BPI,  9  track, 

formatted  tape  out  of  SOUPAC  the  easiest  thing  to  do  at  the  moment  is  to  over-ride 

punch  statement  in  the  proc,  i.e.,  FT07F001.   To  input  a  formatted  tape  to 
» 

SOUPAC,  over-ride  the  procedure  in  an  appropriate  v;ay  and  use  the  MATRIX  INPUT 

'ruction.   To  input  a  SOUPAC  written  unformatted  tape  just  over-ride  the  pro- 
cedure . 

Over-rid 3 ng  the  SOUPAC  procedure  for  using  user  data  storage  devices 
SETUP  or  not  SETUP 

Users  data  exists  on  permanently  mounted  volumes  such  as  U1DCS1  or  on  dismount- 
able  volumes  such  as  UIUSRH  or  DK0013  or  user  tapes  or  disk  packs.   In  this  context, 
a  vo]unie  is  that  physical  di  sk  pack  or  tape  which  gets  mounted  onto  a  disk  drive 
or  tape  dirvc.   If  a  volume  is  permanently  mounted  you  need  do  nothing  at  this  setup 
stage.   If  it  is  discountable,  you  need  a  /*SETUP  card  for  each  such  volumne  among 
your  /*  cards,  if  any,  after  the  /*ID  card.   Tapes  always  require  a  /-SETUP  card. 
For  data  sets  on  disk  packs,  determine  from  the  user  whether  it  is  disniountable  or 
not.   The  form  of  a  /*SETUF  card  is  as  f ollows :   1 

/*SETUPUNIT=type,ID=name 

,ID- (name,  label ) . « 

Type  is  DISK  or  TAPE:   the  choice  is  obvious. 

For  tapes,  name  under  ]B  will  be  0XXXXX  for  an  owner  tape  where  XXXXX  is  some- 
or   PXXXXX  or  IOOCXXX  where  P  and  .L  are  pool  and  lease  tapes 
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rented  from  DCS. 

A  tape  may  be  internally  labeled  or  not.   If  it  is  not  labeled  the  label 
argument  is  NL,  meaning  no  label.   If  it  is  labelled  the  argument  will  be  SL, 
meaning  standard  label.   If  it  is  standard  labelled,  later  in  the  over-ride  of 
the  proc  you  will  need  to  know  the  DSNAME. 

If  ho  label  argument  is  used  for  tapes  the  default  is  SL.   For  dismountable 
disk  packs  at  DCS,  the  volume  name  is  all  that  is  needed  since  they  are  all 
internally  standard  labelled.   You  must  find  out  from  a  user  if  his  tape  or  non- 
DCS  disk  pack  is  internally  labelled  or  not. 

Over-riding  the  SOUPAC  proc 

Assuming  you  now  have' the  user  volume  mounted,  you  must  still  tell  SOUPAC 
about  it.   Suppose  that  the  user  volume  is  a  sequential  data  set  (or  possibly 
more  than  one  sequential  data  set).  Usually  they  will  be.   Direct  access  data 
sets  will  be  discussed  later.   You  must  assign  it  a  SOUPAC  sequential  data  set 
reference  number.   Numbers  11  through  ^9  are  available  and  correspond  to  the 
SOUPAC  addresses  SI  through  S39  (or  Tl  through  T39) ■   Which  data  set  reference 
number  you  use  does  not  make  any  difference.   For  sake  of  example  let  us  choose 
SI  and  over-ride  FT11F001  in  the  SOUPAC  proc.   Our  JCL  should  look  like  this  to 
this  point: 


/*ID 

/*SETUP  UiIIT-- type,  ID=  (name,  label) 
//  EXEC  SOUPAC 

//FTllPOOl  DD  indication  string 
//SYS1N  DD  * 


The  Indication  string  has  some  or  all  of  the  parameters  listed  below.  Use  all 
pt  those  where   an  option  is  mentioned. 


UNIT  type 


type  is  the  same  as  that  on  the  SETUP  card; 
if  no  setup  then  it  is  DISK 
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VOL=SER=name 


LABEL=(,NL) 
LABEL=( ,SL) 

DISP= 


DSNAME= 


name  is  the  same  as  that  on  the  SETUP  card; 
if  no  setup,  it  is  the  name  of  a  permanently' 
mounted  volume 

same  as  label  argument  if  used  on  SETUP  card; 
if  not  used  on  SETUP  card,  can  be  omitted 

arguments  are  NEW,  OLD,  KEEP,  DELETE,  CATALG, 
UNCATALG,  PASS  singly  or  in  combination  depending 
on  what  you  are  doing 

some  name;  use  if  you  have  a  standard  labelled 
tape;  otherwise  optional  depending  on  what  you 
are  doing 


SPACE(kind, (primary ,  secondary)) 


not  necessary  for  an  OLD, 
PASS,  or  CATALG  data  set; 
necessary  for  a  NEW  data 
set;  not  necessary  for  a 
NEW  user  tape  data  set 


kind=TRK 
CYL 


primary 


secondary 


DCB=arg  l=arg  2 

DCB=(arg  1,  arg  2,  arg  3) 

DCB  arguments  are  as  follows 


a  TRK  holds  7200  bytes 

a  CYL  has  20  tracks 

there  are  other  kind  parameters ,  but  you  probably 

should  not  be  using  them 

number  of  kinds  of  space  units  you  initially 
request;  you  get  this  space  at  once;  if  it  is 
not  possible  to  allocate  you  get  a  space  un- 
available message 

if  you  run  off  the  end  of  your  primary  allocation 
you  get  15  extends  of  the  number  you  specify  as 
secondary  of  the  space  units  you  specify  as  kind 

If  the  machine  cannot  give  you  15  extents  or 

if  after  15  extents  you  still  need  space  you  get 

a  space  unabailable .message 

optional  depending  on  what  you  are  doing 


RECFM=   F   fixed  record  format  (used  for  formatted  I/O) 

V   varying  record  format  (usually  used  for  unformatted 

I/O) 
U   undefined  record  format  (used  for  formatted 

or  unformatted  I/O) 
B   blocked  record  format 
A   alphameric  record  format;  usually  just  on  SYSOUT 


these  may  be  used  in  combination  as  in  RECFM=FB. 
you  must  also  use  LRECL  and  BLKSIZE. 


Note  if  B  is  used 


LRECL=n     n  is  the  logical  record  length  in  bytes 
BKLSIZE  =m  m  is  the  blocksize  in  bytes 
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If  no  DCB  is  specified  you  get  the  OS  default  DOB.   You  may  not  necessarily 
want  this.   Ascertain  from  the  user  what  DCB  he  lias  and  use  it.  For  more  in- 
formation  on  DCB's  check  your  FORTRAN  users  guide.   But,  v/e  mention  several  pit- 
falls already  encountered  and  several  points  of  information.   SOUPAC  by  default 
uses: 

'  dcb-"-(recfm=-v,lrecl-796;blksize=8oo) 

If  you  write  a  formatted  data  set  with  DISP-PASS  you  get  RECFM--F  whether  you 
specify  it  or  not  and  SOUPAC,  unless  you  specify  DCB,  will  kick  you  off  with  the 
message  that  it  has  an  F  or  U  RECFM  but  expecting  V.   If  you  had  a  tape  written 
with  format  conversion  in  another  job  and  did  not  specify  DCB-^RECFiM^F  you  will 
probably  get  an  illegal  decimal  character  message.   If  you  give  SOUPAC  a  do.ta 
set  with  DCB--^RECFM=F  some  way  or  another,  SOUPAC  will  be  unable  to  write  on  it 
since  it  v/rites  unformatted  unless  you  use  the  MATRIX  PUNCH  instruction  and  over- 
ride FT07F001.   If  you  over-ride  FT07F001  remember  you  get  a  SOUPAC  DATA  card 
image  and  an  END//  card  image .   If  you  write  large  formatted  strings  or  many  vari- 
ables, you  should  check  on  default  LRECL  and  BLKSIZE.   If  your  record  is  larger, 
than  LRECL,  you  just  spill  onto  the  next  logical  record  and  the  machine  will  worry 
about  it.   If  your  record  is  larger  than  BLKSIZE,  you  get  at  best  that  piece  of 
the  record  that  fits  into  BLKSIZE  and  the  rest  is  lost.   There  are  several  tricky! 
things  that  arc  going  on  here,  but  usually  you  need  not  worry  about  them.   If 
your  record  is  shorter  than  LRECL,  then  most  of  LRECL  is  unused  and  you  may  be' 
wasting  space.   This  becomes  important  when  you  start'  getting  space  unavailable 
or  end  of  volume  messages.   Furthermore  it  is  inefficient,  though  it  is  often  much 
more  convenient  just  to  take  the  defaults.   Again,  consult  your  FORTRAN  user's 
guide  for  more  details. 


EX.AKPLES 

Write  onto  a  users  tape  a  SOUPAC  data  set  In  binary 


/-ID 

/*SKTUP     UNIT^'APE,  ID=  (jfcQCXXX,  NL) 

//  EXEC  SOUPAC   . 

//P T.U'FOOl  DD  UNI-l^TAHK, VOL=--SKR=-^XXXXX, LABEL=  ( ,NL )  ,DISP=  (NEW ,KEEP ) 

//SYSIN  DD  * 


TRANS). 


*)RMATIONS(]  A)  (T2)  (13) 


Y/rJte  onto  a  users  tape  a  SOUPAC  data  set  in  EBCDIC 


/*ID 

/*SETUP  UNIT-TAPE,  ID- (p'XXXXX.NL) 

//  EXEC  SOUPAC 

//PTOTFOOl     DID     UNIT=TAPE,  VOL-SER-0XXXXX,  LABEL=  ( ,NL )  ,DISP=  (NEW ,KEEP ) 

//SYSIN  DD  * 


MATRIX. 

PUNCH  (JA)".  (format)" 


note  that  you  will  get  the  SYSPUMCH  default  DCB  which  limits  you  to  card  jmo.ges/ 
i.e.,    80  bytes  per  record.   For  larger  records  use  your  own  DCB. 

Read  a  users  formatted  tape  into  SOUPAC 

/*ID 

/*SETUP  UNIT=TAPE,ID=(0XXXXX,SL) 

//  EXEC  SOUPAC 

//FTU9F001   DD  UNIT=TAPE,VOL=SER=0XXXXX,LABEL=(,SL),DISP= (OLD, KEEP), 

//  DSN=USER.PYXYY.TPDATA,DCB=(RECFM=FB,LRECL=80,BLKSIZE-800) 

//SYSIN  DD   * 

MATRIX . 

INPUT (S39 ) (SI )(nrow)(ncol)"( format)". 

where  (nrow)  is  the  number  of  observations  and  (ncol)  is  the 

number  of  variables  per  observation  in  the  data  matrix  . 

note  that  you  are  reading  a  card  image  tape  with  10  card  images  per  physical 
record,  i.e.,  blocking  factor  of  10  to  1. 


Read  a  users  unformat  Led  tape  into  SQUPAC  written   by  SOUPAC 


/*1D 

/*SETUP     U70T  TAPE,ID-(0XXXXX,NL) 

//     EXEC      SOUPAC 

//FT13F001     DD     UNI  T=TAPE,  VOL-SER-0XXXXX,  DISP=  (OLD, KEEP) ,  LABEL=  ( ,NL ) 

//SYSIN  DD  * 

COR(T3)(P)(P). 


note  this  presupposes  that  SOUPAC  wrote  the  tape  originally 

Read  a  users  unformatted  tape  into  SOUPAC  not  written  by  SOUPAC 

i 

/*ID 

/*  SETUP  UNIT-TAPE,  1D=  (p'XXXXX,  NL) 

//  EXEC  SOUPAC 

//FrJ\UF001  DD  UNIT=TAPE,  VOL=SER-0XXXXX,  DISE=  (OLD,  KEEP) ,  LABEL=  ( ,NL ) 

//         DCB=(RECFM=V,  LRECL=HOO,  JlTJKSlZE=kOk) 

//SYS IN  DD  * 

//OLD (Tl)  (HR0V7)  (IICOL) . 

MATRIX . 

M0VE(T1)(T2). 

note  first  of  all  you  need  the  #OLD  card  to  simulate  a  SOUPAC  header  record. 
Further  each  record  must  contain  three  extra  integer  *k   words  at  the  front  of 
the  record  to  correspond  to  SOUPAC  records.  Also,  if  it  was  written  with  default 
DCB  you  do  not  need  to  specify  a  DCB.   Finally,  the  f/'OLD  card  is  good  to  use  if 
you  are  having  header  record  problems  in  SOUPAC  anyway. 

Read  a  use?:  data  set  from  a  dismountable  disk 

/*ID 

/* SETUP  UNIT-DISK, ID=DK0291 

//  EXEC   SOUPAC 

//FT12F001   DD  UN1T=DISK, V0L-SER-DK0291, DSN-USER .  PXXX . name, 

//         DISP= (OLD, KEEP}, DCB=(RECFM=  ,lrecl=  ,BLKSIZE=  ) 

//SYSIN  DD  * 

use  input -if  formatted;  otherwise  go  through  the  appropriate  sequence  for  un- 
formatted reads.   To  write  the  data  set,  use  the  SPACE  parameter;  change  the 
DISP  }     Ler  appropriately;  and  consider  what  kind  of  write  you  want  and  fix 
the  over- ride  appropriately. 

For  your  information:  •    ...  . 

LABEL* (,NL)  or  LABEL=(,SL)  is  equivalent  to  LABEL* (l,NL)  or  LABEL=(l,SL) 
which  In  fact  i     file  3  on  the  volumne.  Thus  to  access  file  2  you  would 
writ<  LABE]      •)  or  LABEL«(2,SL^,  or  in  general,  LABEL=(n,XX)  where  n  is 
the  fil<  NL  or  SL. 
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If  you  intend  to  generate  data  in  SOUPAC  and  wish  to  use  the  data  in  any  context 
other  than  SOUPAC,  it  is  highly  recommended  that  you  do  not  simply  save  the  binary 
image  file  but  generate  a  formatted  record  data  set.   Formatted  record  data  sets 
interface  more  completely  with  other  systems. 


(Additions  to  the  MATRIX  program) 

ABSOLUTE  VALUE  (mnemonic:   ABS) 

The  ABSOLUTE  VALUE  operation  has  two  address  parameters,  an  input 
address  and  an  output  address.   The  absolute  value  of  each  element  of 
the  input  matrix  is  taken  and  the  result  goes  to  the  output  address. 
Example : 

ABSOLUTE  VALUE  ( SEQ3 ) ( SEQU  ) . 

COUNT  (mnemonic:   COU) 

The  COUNT  operation  has  three  operands;  an  input  address,  an  output  address 
and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single 
row  vector  containing  a  count  of  the  number  of  elements,  excluding 
missing  data,  of  each  column  of  the  input  matrix.   Specifying  no  option 
is  equivalent  to  specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single  column 
vector  containing  a  count  of  the  number  of  elements,  excluding  missing 
data,  of  each  row  of  the  input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1 ,  a  single 
element  matrix  is  output  which  contains  a  count  of  the  number  of  elements, 
excluding  missing  data  over  the  entire  matrix.   Examples: 

COUNT  (SEQ1)(SEQU). 
COUNT  (SEQ5)(SEQ3)(l). 
COUNT  (SEQ2)(SEQ3)(2). 

DIMENSION  (mnemonic:   DIM) 

The  DIMENSION  operation  has  two  address  parameters,  an  input  address 
and  an  output  address.   The  number  of  rows  and  the  number  of  columns 
of  the  input  matrix  are  used  to  form  the  first  and  second  elements 
respectively  of  a  two  element,  single  row  matrix  which  is  output  to 
the  output  address.   Example: 

DIM  (SEQ2)(SEQ1). 

EXPAND  (mnemonic:   EXP) 

The  EXPAND  operation  takes  the  first  row  of  the  first  input  matrix  and 
repeatedly  outputs  that  same  row  to  the  output  address  the  number  of 
times  specified  by  the  second  parameter. 

The  second  parameter  can  be  either  a  floating  point  number  in  which  case 
the  input  row  is  copied  to  the  output  address  the  number  of  times  specified 
by  the  integer  portion  of  the  floating  number;  or  the  second  parameter 
can  be  an  input  address  in  which  case  the  input  row  is  copied  to  the 
output  address  until  the  output  matrix  has  the  same  number  of  rows  as 


the  second  input  matrix.   The  third  parameter  is  the  output  address. 
Examples : 

EXPAND  (SEQ1)(SEQ2)(SEQ3). 
EXPAND  (SEQl)*55*(SEQl|). 

MAXIMUM  (mnemonic:   MAX) 

The  MAXIMUM  operation  has  three  operands;  an  input  address,  an  output 
address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single  row 
vector  containing  the  maximum  element  of  each  column  of  the  input  matrix. 
Specifying  no  option  is  equivalent  to  specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single  column 
vector  containing  the  maximum  element  of  each  row  of  the  input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1 ,  a  single 
element  matrix  is  output  which  contains  the  maximum  element  of  the  entire 
matrix.   Examples: 

MAX  ( SEQ1 ) ( SEQ3 ) . 
MAX  (SEQ1)(SEQ2)(2). 

MINIMUM  (mnemonic:   MIN) 

The  MINIMUM  operation  has  three  operands;  an  input  address,  an  output 
address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single  row 
vector  containing  the  minimum  element  of  each  column  of  the  input  matrix. 
Specifying  no  option  is  equivalent  to  specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single 
column  vector   containing  the  minimum  element  of  each  row  of  the  input 
matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1,  a  single 
element  matrix  is  output  which  contains  the  minimum  element  of  the  entire 
matrix.   Examples: 

MIN  (SEQ1)(SEQ3). 
MIN  (SEQ1)(SEQ2)(2). 

SUM  (mnemonic:   SUM) 

The  SUM  operation  has  three  operands;  an  input  address,  an  output  address, 
and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single  row 
vector  containing  the  column  sum  of  each  column  of  the  input  matrix. 
Specifying  no  option  is  equivalent  to  specifying  option  0. 


If  option  1  is  specified,  the  resulting  output  matrix  is  a  single  column 
vector  containing  the  row  sum  of  each  row  of  the  input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1,  a  single 
element  matrix  is  output  which  contains  the  sum  of  all  elements  over 
the  entire  matrix.   Examples: 

SUM  (SEQ1)(SEQ3). 
SUM  (SEQ1)(SEQ2)(2). 
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